Open biona001 opened 3 years ago
So I left the program running overnight, and it finished after 176 minutes. I am still suspicious it took so long with such small data, but I suppose this is not a bug. I will ask on the Google group instead.
FWIW, the output is
## REMLE estimate for Vg in the null model:
23.5963
6.27189 23.8002
## se(Vg):
4.43943
2.74395 5.46859
## REMLE estimate for Ve in the null model:
2.63552
0.00017173 0.000237989
## se(Ve):
1.33862
0.808768 1.67335
## MLE estimate for Vg in the null model:
25.3766 6.28991
6.28991 23.8008
## se(Vg):
5.80539
0.898146 1.06494
## MLE estimate for Ve in the null model:
2.07569 0.00105176
0.00105176 5.43669e-07
## se(Ve):
1.75241
0.000879422 4.27974e-07
## estimate for B (d by c) in the null model (columns correspond to the covariates provided in the file):
0.506032
1.00318
## se(B):
0.0513373
0.000487841
##
## Computation Time:
## total computation time = 176.405 min
## computation time break down:
## time on eigen-decomposition = 0.0822538 min
## time on calculating UtX = 0.0785359 min
## time on optimization = 176.135 min
##
I did 100 more simulations, and 25/100 simulations ran for over 1000 seconds. In contrast, the other 75 simulations usually run in 30-60 seconds.
I am reopening this issue, but feel free to close it.
Thank you for your data file. Not sure why timings are so divergent. Much of the time is spent in:
#4 0x0000000000480f62 in MphEM (func_name=func_name@entry=82 'R', max_iter=1000, max_prec=0.001,
eval=eval@entry=0x5233c0, X=X@entry=0x540930, Y=Y@entry=0x5408d0, U_hat=U_hat@entry=0x53e380,
E_hat=0x53e3e0, OmegaU=0x53e440, OmegaE=0x53e4a0, UltVehiY=0x53e500, UltVehiBX=0x53e560,
UltVehiU=0x53e5c0, UltVehiE=0x53e620, V_g=0x540990, V_e=0x540a20, B=0x540ab0) at src/mvlmm.cpp:704
We need to look into optimizing the mvlmm code - there should be ample opportunity.
Sorry I forgot about this issue. FWIT, I think the main problem might be how I did the simulation. I believe it was something like
b ~ N(0, 1) # for k position of b only, others are 0
yi ~ N(xi^T * b, 1)
where yi
is length 2 vector of phenotype, xi
is length 10000 genotype vector (entries 0, 1 or 2), and b is length 10000 effect size vector but only k
position of b
is drawn from standard Normal. This might be a poor simulation model for GEMMA. Later I changed the simulation model and the divergence problem went away.
It is good to have edge cases. I'll see what I can do with it :)
On a simulated dataset with 1000 samples, 10000 SNPs, and 2 traits, multivariate GEMMA ran for over an hour. Is this reasonable?
My commands are:
Output: (the 100% shows up almost immediately, but hangs forever and never displays the "done").
If I analyze the traits separately, the analysis finishes in ~5 seconds. I assume this is a bug but I'm not sure.
For reproducibility, my test PLINK files are zipped and attached.
multivariate.zip