Closed Mills33 closed 4 years ago
Hi, I was running into the same issue as the OP, but then I saw this issue and updated the software. But now, it's giving me a different error.
Traceback (most recent call last):
File "gadma.R3.m4.moments.py", line 37, in
I think the error is because the code file generated has a growth function: growth_funcs = [lambda t: before[0] + (after[0] - before[0]) * (t / T)]
but the before = after code comes after.
The moments code is:
`#current best params = [8.354928346378932, 0.88575956698093, 0.23007835953494235, 0.4805958920229654, 0.3502832658458346, 0.06304853182036993, 19.50065332970754, 7.965459940348304]
import matplotlib
matplotlib.use("Agg")
import moments
import numpy as np
def generated_model(params, ns):
nu21, s0, nu31, nu32, t1, t2, m2_12, m2_21 = params
theta1 = 0.2838653
sts = moments.LinearSystem_1D.steady_state_1D(sum(ns), theta=theta1)
fs = moments.Spectrum(sts)
T = t1
after = [nu21]
growth_funcs = [lambda t: before[0] + (after[0] - before[0]) * (t / T)]
list_growth_funcs = lambda t: [ f(t) for f in growth_funcs]
fs.integrate(Npop=list_growth_funcs, tf=T, dt_fac=0.1, theta=theta1)
before = after
fs = moments.Manips.split_1D_to_2D(fs, ns[0], sum(ns[1:]))
before.append((1 - s0) * before[-1])
before[-2] *= s0
T = t2
after = [nu31, nu32]
growth_funcs = [lambda t: before[0] * (after[0] / before[0]) ** (t / T), lambda t: after[1]]
list_growth_funcs = lambda t: [ f(t) for f in growth_funcs]
m = np.array([[0, m2_12],[m2_21, 0]])
fs.integrate(Npop=list_growth_funcs, tf=T, m=m, dt_fac=0.1, theta=theta1)
return fs
data = moments.Spectrum.from_file('/scratch/snyder/m/mathur20/MQU/ch3/demography/gadma/only_syn/2pop/sfs/dadi/TX-AZ.sfs')
data = data.project([25, 25])
popt = [8.354928346378932, 0.88575956698093, 0.23007835953494235, 0.4805958920229654, 0.3502832658458346, 0.06304853182036993, 19.50065332970754, 7.965459940348304]
ns = [25, 25]
model = generated_model(popt, ns)
N_A = 115793.640440
ll_model = moments.Inference.ll(N_A * model, data)
print('Model log likelihood (LL(model, data)): {0}'.format(ll_model))
theta0 = 0.2838653
print('Drawing model to model_from_GADMA.png')
model = moments.ModelPlot.generate_model(generated_model, popt, ns)
moments.ModelPlot.plot_model(model,
save_file='model_from_GADMA.png',
fig_title='Demographic model from GADMA',
pop_labels=['TX', 'AZ'],
nref=int(theta / theta0),
draw_scale=True,
gen_time=0.001,
gen_time_units='Thousand years',
reverse_timeline=True)
try:
import gadma
print ("Calculation of CLAIC (time consuming, enter Ctrl+C to stop)")
all_boot = gadma.Inference.load_bootstrap_data_from_dir("/scratch/snyder/m/mathur20/MQU/ch3/demography/gadma/final/bootsfs")
for eps in [1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8]:
try:
claic_score = gadma.Inference.get_claic_score(func_ex, all_boot, popt, data, pts=None, eps=eps)
except Exception as e:
print("Error for eps = {0:.1e} : ".format(eps) + str(e))
claic_score = None
print("Model Composite Likelihood AIC score (CLAIC(p0, eps={0:.1e})): {1}".format(eps, claic_score))
except ImportError:
print("Install GADMA to calculate CLAIC score")`
Please let me know if this is a bug or something that I can just edit on my end in the code to make it work.
Thank you for your time and writing such a brilliant software :)
Dear @samarth8392, thank you for your interest to GADMA!
It is bug in code generation of GADMA. You are right about the error. I will check this out and try to fix generation, but now I suggest to fix it by yourself adding before = [1.0]
right before the first block:
...
fs = moments.Spectrum(sts)
before = [1.0]
T = t1
after = [nu21]
...
I hope that will help!
Best regards, Ekaterina
Hi Ekaterina, Thank you for the quick response. I added the before statement and tried running it again, but now the error is:
Calculation of CLAIC (time consuming, enter Ctrl+C to stop)
If you use the Godambe methods in your published research, please cite Coffman et al. (2016) in addition to the main dadi paper Gutenkunst et al. (2009).
AJ Coffman, P Hsieh, S Gravel, RN Gutenkunst "Computationally efficient composite likelihood statistics for demographic inference" Molecular Biology and Evolution 33:591-593 (2016)
Traceback (most recent call last):
File "final_model6_moments_code.py", line 58, in <module>
claic_score = gadma.Inference.get_claic_score(generated_model, all_boot, popt, data, pts=None, eps=1e-15)
File "/home/mathur20/.conda/envs/cent7/5.3.1-py37/gadma/lib/python3.7/site-packages/gadma-1.0.1-py3.7.egg/gadma/Inference.py", line 117, in get_claic_score
return 2 * get_claic_component(func_ex, all_boot, p0, data, pts=pts, eps=eps) - 2 * ll_model
File "/home/mathur20/.conda/envs/cent7/5.3.1-py37/gadma/lib/python3.7/site-packages/gadma-1.0.1-py3.7.egg/gadma/Inference.py", line 86, in get_claic_component
H_inv = np.linalg.inv(H)
File "<__array_function__ internals>", line 6, in inv
File "/home/mathur20/.local/lib/python3.7/site-packages/numpy/linalg/linalg.py", line 551, in inv
ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
File "/home/mathur20/.local/lib/python3.7/site-packages/numpy/linalg/linalg.py", line 97, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix
I am not sure if I understand "singular matrix error"
My moments script is:
#current best params = [16.01065952404785, 0.7305758264654852, 0.2462289184792431, 0.46875443298942726, 0.2984985370324451, 0.12532423399443587, 19.521916397305933, 6.177478938766556]
import matplotlib
matplotlib.use("Agg")
import moments
import numpy as np
def generated_model(params, ns):
nu21, s0, nu31, nu32, t1, t2, m2_12, m2_21 = params
theta1 = 0.2838653
sts = moments.LinearSystem_1D.steady_state_1D(sum(ns), theta=theta1)
fs = moments.Spectrum(sts)
before = [1.0]
T = t1
after = [nu21]
growth_funcs = [lambda t: after[0]]
list_growth_funcs = lambda t: [ f(t) for f in growth_funcs]
fs.integrate(Npop=list_growth_funcs, tf=T, dt_fac=0.1, theta=theta1)
before = after
fs = moments.Manips.split_1D_to_2D(fs, ns[0], sum(ns[1:]))
before.append((1 - s0) * before[-1])
before[-2] *= s0
T = t2
after = [nu31, nu32]
growth_funcs = [lambda t: after[0], lambda t: before[1] * (after[1] / before[1]) ** (t / T)]
list_growth_funcs = lambda t: [ f(t) for f in growth_funcs]
m = np.array([[0, m2_12],[m2_21, 0]])
fs.integrate(Npop=list_growth_funcs, tf=T, m=m, dt_fac=0.1, theta=theta1)
return fs
data = moments.Spectrum.from_file('/scratch/snyder/m/mathur20/MQU/ch3/demography/gadma/only_syn/2pop/sfs/dadi/TX-AZ.sfs')
data = data.project([25, 25])
popt = [16.01065952404785, 0.7305758264654852, 0.2462289184792431, 0.46875443298942726, 0.2984985370324451, 0.12532423399443587, 19.521916397305933, 6.177478938766556]
ns = [25, 25]
model = generated_model(popt, ns)
N_A = 114402.034028
ll_model = moments.Inference.ll(N_A * model, data)
print('Model log likelihood (LL(model, data)): {0}'.format(ll_model))
theta0 = 0.2838653
#print('Drawing model to model_from_GADMA.png')
#model = moments.ModelPlot.generate_model(generated_model, popt, ns)
#moments.ModelPlot.plot_model(model,
# save_file='model_from_GADMA.png',
# fig_title='Demographic model from GADMA',
# pop_labels=['TX', 'AZ'],
# nref=int(theta / theta0),
# draw_scale=True,
# gen_time=0.001,
# gen_time_units='Thousand years',
# reverse_timeline=True)
try:
import gadma
print ("Calculation of CLAIC (time consuming, enter Ctrl+C to stop)")
all_boot = gadma.Inference.load_bootstrap_data_from_dir("/scratch/snyder/m/mathur20/MQU/ch3/demography/gadma/final/bootsfs")
claic_score = gadma.Inference.get_claic_score(generated_model, all_boot, popt, data, pts=None, eps=1e-15)
print("Model Composite Likelihood AIC score (CLAIC(p0, eps={0:.1e})): {1}".format(eps, claic_score))
#for eps in [1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8]:
# try:
# claic_score = gadma.Inference.get_claic_score(func_ex, all_boot, popt, data, pts=None, eps=eps)
# except Exception as e:
# print("Error for eps = {0:.1e} : ".format(eps) + str(e))
# claic_score = None
# print("Model Composite Likelihood AIC score (CLAIC(p0, eps={0:.1e})): {1}".format(eps, claic_score))
except ImportError:
print("Install GADMA to calculate CLAIC score")
Dear @samarth8392,
It is numerical error of CLAIC evaluation with moments. I hope that changing eps
argument will solve this issue. You could just change eps=1e-15
in function gadma.Inference.get_claic_score
to some other value. Or you can loop over several values of eps and try them all.
For the second variant you should uncomment commented code in the file you have attached:
#for eps in [1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8]:
# try:
# claic_score = gadma.Inference.get_claic_score(func_ex, all_boot, popt, data, pts=None, eps=eps)
# except Exception as e:
# print("Error for eps = {0:.1e} : ".format(eps) + str(e))
# claic_score = None
# print("Model Composite Likelihood AIC score (CLAIC(p0, eps={0:.1e})): {1}".format(eps, claic_score))
and comment out or remove two lines above this block.
The smaller eps is the better value of CLAIC is but this numerical issues sometimes spoil everything. Have you asked GADMA to evaluate CLAIC during the run? If so it usually prints eps
that was used for the calculation, was it 1e-15
?
Please keep me updated.
Ekaterina
Hi Ekaterina, I chose a single value because I wasn't sure how long it would take, but apparently I chose the worst value. I ran the code in a for loop and I now got an output
Model log likelihood (LL(model, data)): -3058.901747281811
Calculation of CLAIC (time consuming, enter Ctrl+C to stop)
If you use the Godambe methods in your published research, please cite Coffman et al. (2016) in addition to the main dadi paper Gutenkunst et al. (2009).
AJ Coffman, P Hsieh, S Gravel, RN Gutenkunst "Computationally efficient composite likelihood statistics for demographic inference" Molecular Biology and Evolution 33:591-593 (2016)
Model Composite Likelihood AIC score (CLAIC(p0, eps=1.0e-02)): 607922.7364217453
Model Composite Likelihood AIC score (CLAIC(p0, eps=1.0e-03)): 607688.7841463592
Model Composite Likelihood AIC score (CLAIC(p0, eps=1.0e-04)): 607684.8027957648
Model Composite Likelihood AIC score (CLAIC(p0, eps=1.0e-05)): 607612.6622176758
Model Composite Likelihood AIC score (CLAIC(p0, eps=1.0e-06)): 586690.9280410497
Model Composite Likelihood AIC score (CLAIC(p0, eps=1.0e-07)): 353854.05379018356
Model Composite Likelihood AIC score (CLAIC(p0, eps=1.0e-08)): -15222.039192122913
I am amazed how fast it was. I am not sure about negative values, but I am glad I have an output. My next goal is to get confidence intervals using gadma-run_ls_on_boot_data. Any thoughts about running it?
Again, I am so thankful to this software and your time.
Cheers, Samarth
Dear Samarth,
Cool! I think the reliable value is the following:
Model Composite Likelihood AIC score (CLAIC(p0, eps=1.0e-05)): 607612.6622176758
To run gadma-run_ls_on_boot_data
comment out all except the generated_model
or copy it in the separate file. Also the parameters file for those scripts are a bit different to those from GADMA - they should be valid python code with the following variables:
p0, lower_bound, upper_bound, par_labels, pts (only for dadi, not your case).
So it should be something like that:
...
par_labels = ['nu21', 's0', 'nu31', 'nu32', 't1', 't2', 'm2_12', 'm2_21']
...
p0 is the initial point - the result vector of GADMA (popt
from your file). You could also check the usage of scripts by calling it with --help flag. I am not sure but may be lower and upper bound could be automatically generated from par_labels
.
Best regards, Ekaterina
Hi Ekaterina,
Thanks for the advice. I ran thegadma-run_ls_on_boot_data
There were a few bugs in the source code that I edited:
# ERRORS:
#File "/home/mathur20/.conda/envs/cent7/5.3.1-py37/gadma/lib/python3.7/site-packages/gadma-1.0.1-py3.7.egg/gadma/run_ls_on_boot_data.py", line 87
# print str_after_bracket
#File "/home/mathur20/.conda/envs/cent7/5.3.1-py37/gadma/lib/python3.7/site-packages/gadma-1.0.1-py3.7.egg/gadma/run_ls_on_boot_data.py", line 158
# except Exception, e:
#File "/home/mathur20/.conda/envs/cent7/5.3.1-py37/gadma/lib/python3.7/site-packages/gadma-1.0.1-py3.7.egg/gadma/run_ls_on_boot_data.py", line 237
# except Exception, e:
But after I run, I found all my values for bootstrap to be exactly the same. i.e. the CI were also point estimates without any range.
Do you know why would that be the case? I am wondering if I need to change anything? I have 28, and 31 samples for the two pop and I projected them in 25x25 sizes. Also, I am wondering if it is because I am using the same SNP data set. You mentioned in the manual that for CLAIC and CI, we need unlinked markers. If by using the same data for initial modelling and CI, I am making a mistake, let me know. The only workaround I think I can do is by thinning my SNPs to be far enough that they become practically unlinked.
Let me know your thoughts.
Cheers, Samarth
Dear Samarth,
I have updated the version of run_ls_on_boot_data
script. I am not sure about your bootstrap data. Have you perform bootstrap on your SNP's? So every bootstrapped data is different to the original spectrum for inference, correct? Also have you downsized each of the bootstrapped data before running the script? Because script does not know about what size data has and uses just maximum size.
At last maybe change of optimization for gadma-run_ls_on_boot_data
will solve the situation that all runs have the same result values. It is strange behavior indeed, I hope we will find out what is wrong.
Best regards, Ekaterina
Hi Ekaterina, Sorry for not responding earlier. I realized my bootstrapping was incorrect. You guessed it correctly, I was giving it the same SFS (and expecting different output). I downsampled my SNPs (using vcf thinning in vcftools) and reran it, and it gave me a CI range. Again, thank you for your immediate responses. Cannot thank you enough.
Cheers, Samarth
Using python3 and the command python3 best_logLL_model_moments_code.py I get the following error:
'File " best_logLL_model_moments_code.py", line 7 def generated_model((nu21, nu31, nu41, nu51, t1, t2, t3, t4), ns): ^ SyntaxError: invalid syntax
File contents: