Closed davidhoule closed 6 years ago
@davidhoule Your problem may be related to Issue #58. What version of GEMMA are you using, and which operating system?
I am using Ubuntu 16.04.2. I updated to version GEMMA 0.97, with the same results. Relatedness matrix that causes problems has just one (very near 0) negative eigenvalue. Bending matrix to all positive eigenvalues has no effect on the error. Other toy relatedness matrices with many negative near 0 eigenvalues run without problem.
@davidhoule Can you please try downloading the pre-compiled binary for GEMMA v0.96 Linux x86 and see if that works for you? We recently uncovered a bug with linking to GSL 2.x and the bugs looks very similar to the one you reported.
Tried the binary you furnished with the same result.
Then did an analysis using a relatedness matrix estimated in smartpca, part of the Eigensoft package. With this matrix, all versions of the program will run on my toy data set. Estimates of Ve are still 0, but tests are produced, and there are no errors:
Reading Files ...
## number of total individuals = 184
## number of analyzed individuals = 184
## number of covariates = 1
## number of phenotypes = 2
## number of total SNPs = 3
## number of analyzed SNPs = 3
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
0.8298
0.0682 0.7795
se(Vg):
-nan
0.1064 0.1491
REMLE estimate for Ve in the null model:
0.0000
0.0000 0.0388
se(Ve):
-nan
0.0804 0.1126
REMLE likelihood = -475.4224
MLE estimate for Vg in the null model:
0.8298
0.0685 0.8233
se(Vg):
0.0868
0.0613 0.0861
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
0.0000
0.0000 0.0000
MLE likelihood = 70368744177431.5234
Reading SNPs ==================================================100.00%
I assume that the two nan SE are still indicating a problem?
@davidhoule Great progress. It sounds like your siutation is related to Issue #45. In some cases it appears that a different method for computing the relatedness matrix yields more or less numerically stable results, although it is hard to say exactly what the root cause is. I believe that the computation requires tha all the eigenvalues to be positive. It might be useful to check this before running the LMM analysis in GEMMA with the -eigen
option.
It is useful to look at output file result.log.txt
rather than the console output because it gives you the numbers in more precision.
Did you try the univariate LMM analysis? Do you get this problem in the univariate LMM analysis?
Univariate analyses do not seem to have the problem. The actual log.txt file does show very slightly positive estimates of Ve terms, and of all VE SE, except the one nan.
I suspect that there must be an 'effective 0' tolerance, as I have examples that run with many very slightly negative eigenvalues.
Thanks for your help with this issue.
Possibly this is fixed by https://github.com/genetics-statistics/GEMMA/pull/69. Please try the next release. If it does not resolve send me a dataset.
@davidhoule can you send me the toy dataset?
No response
We have a dataset now for testing that shows similar characteristics. Running -lmm 2 renders
##
## GEMMA Version = 0.98-pre1 (2018/06/29)
## GSL Version = 2.4
## Eigen Version = 3.2.9
## OpenBlas = OpenBLAS 0.2.19 - NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell
## arch = Haswell
## threads = 8
## parallel type = threaded
##
## Command Line Input = ./bin/gemma -bfile ./frontiers -lmm 2 -k ./GWA_2017.centered.cXX.txt -maf 0.05 -outdir ./skull/ -snps selected_200_SNPs.txt -debug -legacy -no-check
##
## Date = Fri Jun 29 13:38:52 2018
##
## Summary Statistics:
## number of total individuals = 49
## number of analyzed individuals = 49
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs/var = 73527
## number of analyzed SNPs/var = 200
## REMLE log-likelihood in the null model = 135.494
## MLE log-likelihood in the null model = 140.231
## pve estimate in the null model = 0.958692
## se(pve) in the null model = 0.037907
## vg estimate in the null model = 0.000473816
## ve estimate in the null model = 1.46539e-05
## beta estimate in the null model = 0.00169357
## se(beta) = 0.000546862
##
## Computation Time:
## total computation time = 0.0139996 min
## computation time break down:
## time on eigen-decomposition = 0.00010455 min
## time on calculating UtX = 1.55333e-05 min
## time on optimization = 0.00404858 min
##
while
~/.guix-profile/bin/gemma -bfile ./frontiers -lmm 1 -n 1 2 3 4 5 6 7 8 9 10 -k ./GWA_2017.centered.cXX.txt -maf 0.3 -outdir ./skull/ -snps selected_200_SNPs.txt -debug
renders
MLE estimate for Vg in the null model:
0.0004
-0.0001 0.0002
-0.0000 -0.0000 0.0002
0.0000 0.0001 0.0000 0.0002
0.0000 0.0000 -0.0000 -0.0000 0.0001
-0.0000 0.0000 -0.0000 -0.0000 0.0000 0.0002
-0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0001
-0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 0.0001 0.0001
-0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0000 0.0000 0.0001
-0.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0000 0.0000 0.0001
se(Vg):
0.0001
0.0001 0.0001
0.0001 0.0001 0.0001
0.0001 0.0000 0.0001 0.0001
0.0000 0.0001 -nan 0.0000 0.0001
0.0000 0.0000 0.0001 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0001 0.0000 0.0001 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 -nan 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 -nan
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0001
-0.0000 0.0000 0.0001
-0.0000 -0.0000 -0.0000 0.0000
0.0000 -0.0000 0.0000 0.0000 0.0001
-0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000
0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 0.0000
0.0000 -0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 -0.0000 0.0000
0.0000 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.0000
se(Ve):
0.0000
0.0000 0.0000
0.0000 0.0000 0.0000
0.0000 0.0000 -nan -nan
0.0000 0.0000 -nan -nan 0.0000
0.0000 0.0000 0.0000 -nan -nan 0.0000
0.0000 0.0000 -nan -nan 0.0000 -nan -nan
-nan -nan -nan -nan 0.0000 -nan 0.0000 -nan
0.0000 0.0000 -nan -nan -nan -nan 0.0000 -nan -nan
0.0000 0.0000 -nan -nan -nan 0.0000 0.0000 -nan 0.0000 0.0000
MLE likelihood = -9060217478156.0625
GSL ERROR: matrix is singular in lu.c at line 266 errno 1
When I run with less phenotypes we get
~/.guix-profile/bin/gemma -bfile ./frontiers -lmm 1 -n 1 4 6 -k ./GWA_2017.centered.cXX.txt -maf 0.1 -outdir ./skull/ -snps selected_200_SNPs.txt -debug
##
## GEMMA Version = 0.97 (2017/12/19)
## GSL Version = 2.4
## Eigen Version = 3.3.4
## OpenBlas = OpenBLAS 0.3.0.dev - NO_AFFINITY HASWELL
## arch = HASWELL
## threads = 8
## parallel type = threaded
##
## Command Line Input = /home/wrk/.guix-profile/bin/gemma -bfile ./frontiers -lmm 1 -n 1 4 6 -k ./GWA_2017.centered.cXX.txt -maf 0.1 -outdir ./skull/ -snps selected_200_SNPs.txt -debug
##
## Date = Fri Jun 29 13:42:48 2018
##
## Summary Statistics:
## number of total individuals = 49
## number of analyzed individuals = 49
## number of covariates = 1
## number of phenotypes = 3
## number of total SNPs/var = 73527
## number of analyzed SNPs/var = 182
## REMLE log-likelihood in the null model = 447.058
## MLE log-likelihood in the null model = -1.78495e+09
## REMLE estimate for Vg in the null model:
0.000441807
-7.55708e-06 0.000265048
1.5968e-05 -1.5201e-05 0.000153374
## se(Vg):
0.000111597
6.46807e-05 6.10626e-05
5.13028e-05 2.98818e-05 5.03876e-05
## REMLE estimate for Ve in the null model:
2.14138e-05
-3.21985e-06 1.68684e-06
-1.21203e-05 -1.80521e-06 1.82497e-05
## se(Ve):
1.72819e-05
5.62131e-06 2.21151e-06
1.07007e-05 -nan 1.17464e-05
## MLE estimate for Vg in the null model:
0.000455099 -1.55256e-05 1.40252e-05
-1.55256e-05 0.000269695 -1.75401e-05
1.40252e-05 -1.75401e-05 0.00016208
## se(Vg):
0.000104231
5.57158e-05 5.63383e-05
4.66896e-05 3.62062e-05 4.24585e-05
## MLE estimate for Ve in the null model:
1.62625e-05 -1.99967e-06 -1.05415e-05
-1.99967e-06 1.05748e-06 -1.24257e-06
-1.05415e-05 -1.24257e-06 1.47746e-05
## se(Ve):
2.94059e-23
5.38615e-24 5.59049e-24
4.19386e-24 4.11504e-24 -nan
## estimate for B (d by c) in the null model (columns correspond to the covariates provided in the file):
0.00169357
0.000552855
-0.000453467
## se(B):
0.000661073
0.000185541
0.000610281
##
## Computation Time:
## total computation time = 0.0532221 min
## computation time break down:
## time on eigen-decomposition = 7.95333e-05 min
## time on calculating UtX = 1.60833e-05 min
## time on optimization = 0.0286565 min
##
Commit 70f419673d5d3e49a3eada70c70c2d284b502d7b shows that in all cases eigen decomposition throws FP exceptions when enabling hardware checking. This follows #161.
The problem is in calcPab which generates NaN in some cases. Not completely clear what it is yet, but I think it has to do with uninitialized W. Related to #94
I fixed the NaN issue in se(Ve) with above commit. Also for the multiple phenotypes we get an error where it matters:
./bin/gemma -bfile ./frontiers -lmm 1 -n 1 2 3 4 5 6 7 8 9 10 -k ./GWA_2017.centered.cXX.txt -maf 0.3 -outdir ./skull/ -snps selected_200_SNPs.txt -no-check
GEMMA 0.98-beta1 (2018-09-06) by Xiang Zhou and team (C) 2012-2018
Reading Files ...
## number of total individuals = 49
## number of analyzed individuals = 49
## number of covariates = 1
## number of phenotypes = 10
## number of total SNPs/var = 73527
## number of considered SNPS = 200
## number of analyzed SNPs = 82
Start Eigen-Decomposition...
**** WARNING: Matrix G has 3 eigenvalues close to zero in src/lapack.cpp at line 280 in EigenDecomp_Zeroed
**** WARNING: Matrix G has more than one negative eigenvalues! in src/lapack.cpp at line 284 in EigenDecomp_Zeroed
**** WARNING: Matrix G has 3 eigenvalues close to zero in src/lapack.cpp at line 280 in EigenDecomp_Zeroed
**** WARNING: Matrix G has more than one negative eigenvalues! in src/lapack.cpp at line 284 in EigenDecomp_Zeroed
ERROR: Enforce failed for Trying to take the log of -0.138747 in src/mathfunc.cpp at line 108 in safe_log
It may be possible to solve this, but it is a different issue.
I am running into a similar error message when running the multivariate linear model. Running the same phenotypes individually using a LMM does not result in any errors. Only difference is that I have excluded Individuals that have any incomplete phenotype rows in the multivariate analysis. I am running the latest version of gemma on a macOS Sierra.
gemma -bfile males_dots_imputed_mlm -k centered_relationship_matrix.cXX.txt -lmm 1 -n 1 2 -o mlm_male_dots GEMMA 0.98 (2018-09-28) by Xiang Zhou and team (C) 2012-2018 Reading Files ...
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
0.0101
1.3313 568.4998
se(Vg):
0.0047
0.9901 341.0558
REMLE estimate for Ve in the null model:
0.0009
0.0298 1.0678
se(Ve):
0.0012
0.2447 86.1451
REMLE likelihood = -404.9952
MLE estimate for Vg in the null model:
0.0136
1.4578 573.3371
se(Vg):
0.0015
0.2417 62.1871
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
0.0000
0.0000 0.0000
MLE likelihood = -198158383604301984.0000
GSL ERROR: matrix is singular in lu.c at line 262 errno 1
@hans-recknagel do you mind sharing the dataset with me so I can replicate the problem?
Yes, I've sent them to you by mail. Thanks.
Looks to me like the covariates are highly correlated. mvlmm does not like that.
See also #175
Ok, so correlated variables are not appropriate to use then? I thought because these two different variables capture the phenotype better than just one, but describe pretty much the same phenotype, it would be best to analyse them together.
You should not use correlated covariates/phenotypes. There is no benefit anyway. In #175 we'll write a validator which will emit the error.
Thanks for your response. I was under the impression that the mvlmm is actually for correlated phenotypes (this is what is says in the the paper by Zhou and Stevens 2014). The benefit in my particular case would be that in both phenotypic measures there is a measurement error, but by using the two different measures, the accuracy of getting the "true" value of the overall phenotype (and detecting its genetic basis) increases. This would decrease the chances of finding spurious genetic associations resulting from measurement errors.
This issue is closed. Please use the mailing list for discussion. You may get help there.
I have installed the program and can run your example data set without problem. Similarly, I can run the program on small subsets of the SNPs with my phenotypic data. However, when I calculate relatedness based on the whole genome, the estimates of the relatedness matrix calculated in GEMMA appear to cause problems. I have diagnosed the relatedness matrix as the problem by using the full relatedness matrices in the sample example analyses of my data (two traits, three snps) that run with a relatedness matrix calculated from just the three snps.
The error is consistent, in that the program estimates a 0 Ve matrix, then crashes because of a singular matrix error: Start Eigen-Decomposition... REMLE estimate for Vg in the null model: 1.6815 0.1562 1.6745 se(Vg): 0.3784 -nan -nan REMLE estimate for Ve in the null model: 0.0000 0.0000 0.0000 se(Ve): 0.1228 -nan -nan REMLE likelihood = -478.9822 MLE estimate for Vg in the null model: 1.6815 0.1563 1.6745 se(Vg): 0.1758 0.1246 0.1751 MLE estimate for Ve in the null model: 0.0000 0.0000 0.0000 se(Ve): -nan 0.0000 0.0000 MLE likelihood = -140737488355560.3750 gsl: lu.c:262: ERROR: matrix is singular========================100.00%
It may be that the relatedness in my data set is the problem. It is certainly not what you find in human data. I am studying a set of 184 largely inbred lines, the Drosophila Genome Reference Panel. The majority of sites are fixed, but the proportion of heterozygotes is maybe 5% on average, but that varies among lines. In addition, about 5% of the line pairs are more related than second cousins, and a few seem to be full sibs. Bottom line is that genotypes are always very far from Hardy-Weinberg. I have not filtered the genome for high LD SNP pairs for the calculation of relatedness, although I am aware that this will pose problems for the actual genome-wide association analysis with more than a few SNPs.
I would be happy to furnish example data sets that create the problem if that would be helpful.