genetics-statistics / GEMMA

Genome-wide Efficient Mixed Model Association
https://github.com/genetics-statistics/GEMMA
GNU General Public License v3.0
333 stars 125 forks source link

Multivariable GWAS (-lmm) has NaN values when including more than 3 phenotypes #58

Closed winterwang closed 7 years ago

winterwang commented 7 years ago

Hi, I would like to conduct a 5 phenotypes GWAS analysis using this great software.

I understand that there is a similar issue #45. However, we have done everything you suggested or discussed (quantile transform each phenotype, "second level" of standardization of relatedness matrix) there but still cannot fix the problem.

When the number of phenotypes included increased to >3, the calculation failed:

./bin/gemma -g /home/takeshi/GWAS/Okazaki/Okazaki-analysis/mvSNPTEST/GEMMAchr22txt.gz -p /home/takeshi/GWAS/Okazaki/Okazaki-analysis/GEMMApheno.txt -n 1 2 3 4 -k ./output/GEMMA_standardrelatedmat005.sXX.txt -lmm -o GEMMAtest1234
Reading Files ... 
*## number of total individuals = 807
*## number of analyzed individuals = 795
*## number of covariates = 1
*## number of phenotypes = 4
*## number of total SNPs = 163763
*## number of analyzed SNPs = 114095
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model: 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
se(Vg): 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
REMLE estimate for Ve in the null model: 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
se(Ve): 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
REMLE likelihood = -nan
MLE estimate for Vg in the null model: 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
se(Vg): 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
MLE estimate for Ve in the null model: 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
se(Ve): 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
MLE likelihood = -nan

It seems to be irrelevant with which 4 phenotypes were included in the calculation. By the way, any 3 out of interested phenotypes (n = 5) could finished calculation normally:

##
## GEMMA Version = 0.97
##
## Command Line Input = ./bin/gemma -g /home/takeshi/GWAS/Okazaki/Okazaki-analysis/mvSNPTEST/GEMMAchr22txt.gz -p /home/takeshi/GWAS/Okazaki/Okazaki-analysis/GEMMApheno.txt -n 1 2 3 -k ./output/GEMMA_standardrelatedmat005.sXX.txt -lmm -o GEMMAtest123 
##
## Date = Thu Jul 20 10:38:35 2017
##
## Summary Statistics:
## number of total individuals = 807
## number of analyzed individuals = 799
## number of covariates = 1
## number of phenotypes = 3
## number of total SNPs = 163763
## number of analyzed SNPs = 114078
## REMLE log-likelihood in the null model = -3215.58
## MLE log-likelihood in the null model = -3218.11
## REMLE estimate for Vg in the null model: 
9.92726e-06 
5.33329e-13 9.73246e-06 
-3.00269e-12    5.30004e-11 9.55377e-06 
## se(Vg): 
0.384034    
0.268873    0.370849    
0.252157    0.2977  0.329824    
## REMLE estimate for Ve in the null model: 
0.992726    
0.00435316  0.973246    
-0.00967254 0.538782    0.955377    
## se(Ve): 
0.378022    
0.264058    0.363834    
0.249701    0.29338 0.328996    
## MLE estimate for Vg in the null model: 
9.92726e-06 8.68966e-13 -4.42535e-12    
8.68966e-13 9.73246e-06 5.26974e-11 
-4.42535e-12    5.26974e-11 9.55377e-06 
## se(Vg): 
0.397542    
0.27852 0.383646    
0.260398    0.308059    0.339524    
## MLE estimate for Ve in the null model: 
0.991484    0.00434771  -0.00966044 
0.00434771  0.972028    0.538108    
-0.00966044 0.538108    0.954182    
## se(Ve): 
0.391563    
0.273689    0.376616    
0.257985    0.303754    0.338825    
## estimate for B (d by c) in the null model (columns correspond to the covariates provided in the file): 
-0.00648804 
-0.00101604 
0.00465392  
## se(B): 
0.0352486   
0.034901    
0.0345791   
##
## Computation Time:
## total computation time = 2.63051 min 
## computation time break down: 
##      time on eigen-decomposition = 0.0262652 min 
##      time on calculating UtX = 0.170258 min 
##      time on optimization = 1.97904 min 
##

Could you please give some suggestions about this ?

Many thanks

pcarbo commented 7 years ago

@winterwang I think this is a different issue from #45. Is there any way you would be able to share the data you are using? It does not even need to be the full data---it can be a single chromosome, for example. I'd like to run some tests to understand what is going on.

winterwang commented 7 years ago

@pcarbo Thanks for the reply

pcarbo commented 7 years ago

@winterwang I was able to successfully run this command on my MacBook Pro:

~/git/gemma/bin/gemma -g GEMMAchr22txt.gz -p GEMMApheno.txt \
  -n 1 2 3 4 -k GEMMA_standardrelatedmat005.sXX.txt -lmm

See the attached log file.

What OS are you using? What version of gemma are you using?

result.log.txt.gz

winterwang commented 7 years ago

@pcarbo Thanks for the feedback. However, I still get nan for the command. Previously I used latest version 0.97 downloaded from github.com, compiled and tested on my Ubuntu 16.04 machine:

*********************************************************
  Genome-wide Efficient Mixed Model Association (GEMMA)  
  Version 0.97, 07/07/2017                              
  Visit http://www.xzlab.org/software.html For Updates   
  (C) 2017 Xiang Zhou                                   
  GNU General Public License                             
  For Help, Type ./gemma -h                              
*********************************************************

I found you are using version 0.96 on a Mac Pro. So I switched to version 0.96, the problem still not fixed still on my Ubuntu 16.04 platform:

*********************************************************
  Genome-wide Efficient Mixed Model Association (GEMMA)  
  Version 0.96, 05/17/2017                              
  Visit http://www.xzlab.org/software.html For Updates   
  (C) 2017 Xiang Zhou                                   
  GNU General Public License                             
  For Help, Type ./gemma -h                              
*********************************************************
./gemma -g/GEMMAchr22txt.gz -p GEMMApheno.txt -n 1 2 3 4 -k GEMMA_centerrelatedmat005.cXX.txt -lmm 
Reading Files ... 
## number of total individuals = 807
## number of analyzed individuals = 795
## number of covariates = 1
## number of phenotypes = 4
## number of total SNPs = 163763
## number of analyzed SNPs = 114095
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model: 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
se(Vg): 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
REMLE estimate for Ve in the null model: 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
se(Ve): 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
REMLE likelihood = -nan
MLE estimate for Vg in the null model: 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
se(Vg): 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
MLE estimate for Ve in the null model: 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
se(Ve): 
-nan    
-nan    -nan    
-nan    -nan    -nan    
-nan    -nan    -nan    -nan    
MLE likelihood = -nan
Reading SNPs                                                    0.00%

Reading SNPs kept 0.00% and not moving forward. Can you succeed when using all of the 5 phenotypes on your Mac Pro?

pcarbo commented 7 years ago

@winterwang I was able to reproduce the NaN's using gemma v0.96 for Linux. It is worrying that we have a platform-dependent issue! I will look in to it.

pcarbo commented 7 years ago

@winterwang The discovered the problem: the Linux binary provided in the GEMMA 0.96 release was linked to GSL 2.2.1, and it seems that it only works with GSL 1.x, so I linked with GSL 1.16 instead. Now I am no longer getting the NaN's. I've attached the new binary, and we will soon make a GEMMA 0.97 release available for download.

Can you please test that the attached binary works for you on your Ubuntu system?

gemma.linux.gz

pcarbo commented 7 years ago

Also, I updated the binary for the v0.96 release.

winterwang commented 7 years ago

Hi, @pcarbo . Your help is greatly appreciated. Now using the new binary you attached for me I am able to compute the command with 4 and 5 phenotypes on my Ubuntu 16.04. No more nan is created.

Thank you so much!


##
## GEMMA Version = 0.97
##
## Command Line Input = ./gemma.linux -g GEMMAchr22txt.gz -p GEMMApheno.txt -n 1 2 3 4 -k ./output/GEMMA_centerrelatedmat005.cXX.txt -lmm -o test 
##
## Date = Fri Jul 28 09:14:00 2017
##
## Summary Statistics:
## number of total individuals = 807
## number of analyzed individuals = 795
## number of covariates = 1
## number of phenotypes = 4
## number of total SNPs = 163763
## number of analyzed SNPs = 114095
## REMLE log-likelihood in the null model = -4299.15
## MLE log-likelihood in the null model = -4302.56
## REMLE estimate for Vg in the null model: 
9.96209e-06 
7.58463e-14 9.71815e-06 
-7.87707e-13    1.66e-11    9.56255e-06 
4.365e-13   1.44716e-12 -1.9888e-13 9.37882e-06 
## se(Vg): 
1.06002 
0.760842    1.08813 
0.722019    0.88268 0.981513    
0.663478    0.669846    0.640012    0.841037    
## REMLE estimate for Ve in the null model: 
0.996209    
0.00285371  0.971815    
-0.01166    0.538287    0.956255    
0.0264271   0.0530889   -0.0166509  0.937882    
## se(Ve): 
0.333927    
0.237622    0.337159    
0.227644    0.27476 0.309713    
0.209418    0.209679    0.202253    0.266001    
## MLE estimate for Vg in the null model: 
9.96208e-06 1.26323e-13 -1.08012e-12    1.76122e-13 
1.26323e-13 9.71815e-06 1.63898e-11 1.25851e-12 
-1.08012e-12    1.63898e-11 9.56255e-06 7.06729e-14 
1.76122e-13 1.25851e-12 7.06729e-14 9.37882e-06 
## se(Vg): 
1.08775 
0.781956    1.11965 
0.74072 0.908896    1.00651 
0.67756 0.684627    0.653495    0.85589 
## MLE estimate for Ve in the null model: 
0.994956    0.00285012  -0.0116453  0.0263939   
0.00285012  0.970592    0.53761 0.0530221   
-0.0116453  0.53761 0.955052    -0.01663    
0.0263939   0.0530221   -0.01663    0.936702    
## se(Ve): 
0.342888    
0.244368    0.347138    
0.233676    0.28309 0.317762    
0.214008    0.214445    0.20664 0.270878    
## estimate for B (d by c) in the null model (columns correspond to the covariates provided in the file): 
-0.00670195 
-0.00196422 
0.00190655  
-0.00429808 
## se(B): 
0.035399    
0.034963    
0.0346819   
0.0343471   
##
## Computation Time:
## total computation time = 7.03936 min 
## computation time break down: 
##      time on eigen-decomposition = 0.00856403 min 
##      time on calculating UtX = 0.162786 min 
##      time on optimization = 6.42819 min 
##
pjotrp commented 7 years ago

Can we add a test to the test suite?

pcarbo commented 7 years ago

@pjotrp Good idea. I will do that.

pcarbo commented 7 years ago

@pjotrp I'm having trouble reproducing the error we had earlier, and I no longer have the GEMMA binary linked to the GSL 2.x library. I suggest we close this issue---I don't think this requires an additional test.

pjotrp commented 7 years ago

I can test GSL versions with GNU Guix. We had the same problem in GeneNetwork.org - I'll try to find a dataset.

pjotrp commented 7 years ago

I have reproduced the problem. Will add test data to GEMMA soon - is a plink file which is good too.

pcarbo commented 7 years ago

@pjotrp Great!

pjotrp commented 7 years ago

Looks like the fix of issue 26 also fixes the GSLv2 for this issue. Test that fails on master coming up.

pjotrp commented 7 years ago

This has been fixed and tests added in https://github.com/genetics-statistics/GEMMA/pull/92. Note that if you get NaNs again, it is probably due to highly correlated phenotypes/covariates. Please reopen if an issue resurfaces.