genetics-statistics / GEMMA

Genome-wide Efficient Mixed Model Association
https://github.com/genetics-statistics/GEMMA
GNU General Public License v3.0
333 stars 125 forks source link

Problem including additional covariates in GEMMA lmm #101

Closed 666wisteria closed 7 years ago

666wisteria commented 7 years ago

Hi,

I've been using GEMMA to run a linear mixed model on separate mouse populations with success. I'm now interested in running a mega-analysis including two major mouse populations, and I intend to code the populations as 0 and 1. In addition to population, I am also using sex and body weight at covariates. When I used three covariates together and run GEMMA, I got Nan values for all the estimates -- p values, beta estimates, etc. -- only on chromosome 18 and chromosome X. All other chromosome association results were normal. I've tried to run combinations of covariatres (sex alone, weight alone, population alone, sex+weight, sex+population, sex+weight, or sex+weight+population), and only when I ran all three covariates together did I get these Nan results on chr18 and chrX.

What would cause this issue? Is it possible that SNPs on chr18 and chrX contains redundant information when I use the sex+weight+ population covariate combination? How do I find out the root of the problem and solve it? Thanks in advance!

pcarbo commented 7 years ago

@666wisteria Do you get NaN's for all SNPs on chromosome 18? Are you running GEMMA separately on each of the chromosomes?

666wisteria commented 7 years ago

Yes, I am running GEMMA separately on each chromosome. I am also using the leave-one-chromosome-out approach, so the kinship matrix for each chromosome is calculated by all other autosomal chromosomes.

Sincerely,

Xinxin

On Tue, Oct 17, 2017 at 10:59 AM, Peter Carbonetto <notifications@github.com

wrote:

@666wisteria https://github.com/666wisteria Do you get NaN's for all SNPs on chromosome 18? Are you running GEMMA separately on each of the chromosomes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genetics-statistics/GEMMA/issues/101#issuecomment-337311968, or mute the thread https://github.com/notifications/unsubscribe-auth/AfWx9Dcl2vciB6XV42KndEUJfXD2TxWTks5stOrlgaJpZM4P8gq- .

666wisteria commented 7 years ago

And yes, I get NaN for all SNPs on chromosome 18 and chrmosome X. I've been playing around with the chr18 dosage file, and we've found that if I eliminate a random set of 3 SNPs from chr18, I get normal p values and estimates. I've tried eliminating the 3 beginning SNPs, 3 SNPs in the middle of the file, and the last 3 SNPs, GEMMA runs fine. This phenomenon is bizarre.

pcarbo commented 7 years ago

@666wisteria Adding or removing SNPs should have no impact on the LMM analysis because the SNPs are analyzed independently. Maybe try removing the SNPs from your file, then adding them back in at a different location in the file? (the order of the SNPs should not matter). I wonder if there is a subtle formatting issue (e.g., linefeeds) that is accidentally fixed by removing 3 lines (SNPs) of the dosage file.

666wisteria commented 7 years ago

Hi Peter,

Since the last message I have tried to reformat the dosage file in different ways (remake chr18 and chrX dosage files, remove SNPs in the front and add in the back of the dosage files and other locations), but the result is still depressing. I still get NaN values. Here is my command line:

./gemma -g chr18.F34_F3943.AIL.dosages -p F343943_MA_pheno.txt -n 3 -k autoExceptchr18.F34_F3943.AIL.dosages.cXX.txt -a chr18.F34_F3943.AIL.snpinfo -c F34_F3943_MA_d3_gen.cov -lmm 4 -o chr18.MA.SW.3.F34_F3943.AIL

I tried to attach all of the input files through email, but unfortunately all the files combined are too big. Is there another way that I can transfer data?

Thanks so much!

Xinxin

pcarbo commented 7 years ago

@666wisteria Could you share a link via Dropbox or Google Drive?

666wisteria commented 7 years ago

Yes! I just shared a link through dropbox with you. The five input files are hanging inside the New_filter folder but outside of the F34, F3943, and F34F3943 folders.

Xinxin

On Wed, Oct 18, 2017 at 6:22 PM, Peter Carbonetto notifications@github.com wrote:

@666wisteria https://github.com/666wisteria Could you share a link via Dropbox or Google Drive?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genetics-statistics/GEMMA/issues/101#issuecomment-337772449, or mute the thread https://github.com/notifications/unsubscribe-auth/AfWx9NSPb_GwO1YULxnI4goos4doAFzWks5stqRfgaJpZM4P8gq- .

pcarbo commented 7 years ago

@666wisteria I just checked and this command worked for me:

gemma -g chr18.F34_F3943.AIL.dosages -p F343943_MA_pheno.txt \
  -n 3 -k autoExceptchr18.F34_F3943.AIL.dosages.cXX.txt -lmm 2

Your dosage file does not follow the description given on p. 11 of the GEMMA manual; you need commas between the columns. I've modified your dosage file in the shared Dropbox folder.

666wisteria commented 7 years ago

Thanks for correcting the phenotype file format! The same phenotype file has worked before though, and I think GEMMA is pretty flexible with file format.

The command line you wrote works fine because there is no covariate file. My problem was that when I included the covariate file with the three covariates, I got no results for chr18 and chrX. Have you tried including the covariate file? Does that work when you include the covariates?

Thanks,

Xinxin

Sent from my iPhone

On 22 Oct 2017, at 19:30, Peter Carbonetto notifications@github.com wrote:

@666wisteria I just checked and this command worked for me:

gemma -g chr18.F34_F3943.AIL.dosages -p F343943_MA_pheno.txt \ -n 3 -k autoExceptchr18.F34_F3943.AIL.dosages.cXX.txt -lmm 2 Your dosage file does not follow the description given on p. 11 of the GEMMA manual; you need commas between the columns.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

pjotrp commented 7 years ago

Can you send us the log.txt file. What version of gemma are you using and where did it come from? This can be a duplicate of https://github.com/genetics-statistics/GEMMA/issues/58

pjotrp commented 7 years ago

@pcarbo note that GEMMA generally allows for a mixture of whitespace and commas as delimiters. It does so in Readfile_cvt:

ch_ptr = strtok((char *)line.c_str(), " , \t");
pcarbo commented 7 years ago

@666wisteria This command also worked for me (GEMMA v0.96, MacOSX):

gemma -g chr18.F34_F3943.AIL.dosages -p F343943_MA_pheno.txt \
[result.log.txt.gz](https://github.com/genetics-statistics/GEMMA/files/1407771/result.log.txt.gz)

  -n 3 -c F34_F3943_MA_d3_gen.cov \
  -k autoExceptchr18.F34_F3943.AIL.dosages.cXX.txt -lmm 2

I'm attaching the log file from running this command.

Can you try this command? Please send the log file if it doesn't work.

666wisteria commented 7 years ago

This is the log file when I run the same command. Still NaNs.

##
## GEMMA Version = 0.95alpha
##
## Command Line Input = /projects/ps-palmer/software/local/src/gemma -g /oasis/tscc/scratch/xiz152/AIL_F34F3943/F34to43_genos/New_filtering/chr18.F34_F3943.AIL.dosages -p /oasis/tscc/scratch/xiz152/AIL_F34F3943/F34to43_genos/New_filtering/F343943_MA_pheno.txt -n 3 -k /oasis/tscc/scratch/xiz152/AIL_F34F3943/F34to43_genos/New_filtering/output/autoExceptchr18.F34_F3943.AIL.dosages.cXX.txt -c /oasis/tscc/scratch/xiz152/AIL_F34F3943/F34to43_genos/New_filtering/F34_F3943_MA_d3_gen.cov -lmm 2 -o chr18.MA.SW.3.F34_F3943.AIL 
##
## Date = Mon Oct 23 10:22:25 2017
##
## Summary Statistics:
## number of total individuals = 1047
## number of analyzed individuals = 954
## number of covariates = 4
## number of phenotypes = 1
## number of total SNPs = 2054
## number of analyzed SNPs = 2054
## REMLE log-likelihood in the null model = -nan
## MLE log-likelihood in the null model = -nan
## pve estimate in the null model = 0.999978
## se(pve) in the null model = -nan
## vg estimate in the null model = -nan
## ve estimate in the null model = -nan
## beta estimate in the null model =   -nan  -nan  -nan  -nan
## se(beta) =   -nan  -nan  -nan  -nan
##
## Computation Time:
## total computation time = 0.0848333 min 
## computation time break down: 
##      time on eigen-decomposition = 0.0165 min 
##      time on calculating UtX = 0.00616667 min 
##      time on optimization = 0.044 min 
##
pjotrp commented 7 years ago

You are running an old version of GEMMA. I think it is linked against GSL2 for which we fixed the NaN bug fairly recently. We are close to a new release which should work. If you are not comfortable building your own, we can try and provide a test release in the coming days. Is that an idea?

666wisteria commented 7 years ago

That sounds perfect! How soon do you think I'll be able to get a test release?

On Mon, Oct 23, 2017 at 10:42 AM, Pjotr Prins notifications@github.com wrote:

You are running an old version of GEMMA. I think it is linked against GSL2 for which we fixed the NaN bug fairly recently. We are close to a new release which should work. If you are not comfortable building your own, we can try and provide a test release in the coming days. Is that an idea?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/genetics-statistics/GEMMA/issues/101#issuecomment-338739652, or mute the thread https://github.com/notifications/unsubscribe-auth/AfWx9GUfja3EB3z0Bhg1GuJMvmG6l0iwks5svNAPgaJpZM4P8gq- .

pcarbo commented 7 years ago

@666wisteria If you don't want to wait, I think GEMMA v0.96 should fix your issue (but for longer term usage I would recommend upgrading). At the very least, you can check to see if it fixes your bug.

pjotrp commented 7 years ago

The binary install for 0.96 may be fine if it is linked against GSL1. I'll make a 0.97-pre test release available this week. Be good to test that anyway.

pcarbo commented 7 years ago

I'll make a 0.97-pre test release available this week.

@pjotrp Good idea.

666wisteria commented 7 years ago

So I'm trying to build GEMMA v0.96 locally because the pre-built binary is not working on our HPC I get the following error after loading the gnu module: [aschitre@tscc-login1 GEMMA-0.96]$ module load gnu [aschitre@tscc-login1 GEMMA-0.96]$ make g++ -Wall -Weffc++ -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/main.cpp -o src/main.o In file included from src/main.cpp:25:0: src/param.h:25:28: fatal error: gsl/gsl_vector.h: No such file or directory

include "gsl/gsl_vector.h"

                        ^

compilation terminated. make: *** [src/main.o] Error 1

Where exactly should I change the library paths for LAPACK and GSL in the makefile?

pjotrp commented 7 years ago

See the INSTALL.md for that.

pjotrp commented 7 years ago

I have also created a test release which is a GNU Guix relocatable binary - it mirrors my setup. Get it from http://test-gn2.genenetwork.org/ipfs/QmXBLkReUhvZiMU2ehFeBp7jjCn7d8BJiZAaoL8kukQurY. Unpack it, install and run

mkdir tmp
cd tmp
tar xvjf ../z8wr2iqwcp9fnbqm3rzvjv00qxnk55d2-gemma-git-gn2-0.97.2-gn2-715b1c3-x86_64.tar.bz2
./install.sh ~/gemma

and run it with

~/gemma/gemma-git-gn2-0.97.2-gn2-715b1c3-z8wr2iqwcp9fnbqm3rzvjv0/bin/gemma

It should work. The method I used to create this package is described at https://gitlab.com/pjotrp/guix-relocatable-binary-packages/blob/master/README.org.

pcarbo commented 7 years ago

@pjotrp Can you fix the link at the bottom of your last reply? It doesn't work for me. Thanks.

pcarbo commented 7 years ago

@666wisteria Please try Pjotr's approach. I'm interested to see if it works for you.

To answer your specific question, I recommend modifying the LIBS and GCC_FLAGS Makefile variables to point to your GSL installation. See Makefile.macosx for an example of this.

pjotrp commented 7 years ago

You mean the gitlab link - ah yes, it is still private.

666wisteria commented 7 years ago

Pjotr: I'm trying to install v0.96. Which line in the Makefile should I modify LIBS and GCC_FLAGS? There are multiple lines that have these variables. Could you let me know the line number, if possible?

pcarbo commented 7 years ago

@666wisteria The lines that start with CPPFLAGS = and LIBS =.

pjotrp commented 7 years ago

You can pass these flags on the command line, as stated in the INSTALL doc. Make sure to link 0.96 against GSLv1. GSL2 support came after.

Note that 0.96 is out of date. If you want debugging help you'll also need to try the latest version which has a lot of fixes and better debug support.

pjotrp commented 7 years ago

See https://github.com/genetics-statistics/GEMMA/issues/90

666wisteria commented 7 years ago

Is v0.97 out now? We were trying to install v0.96 because we thought that is the latest version currently available.

666wisteria commented 7 years ago

I have successfully installed v0.97, with a slight modification. Instead of using the command line to run the installer like this: ./install.sh ~/opt/gemma, which didn't work for me, I dropped the opt folder and just ran it like this: ./install.sh ~/gemma. I think after unpacking the tarball I did not get an opt folder, and that was what kept me from successfully installing GEMMA using the command you provided.

pjotrp commented 7 years ago

Good. It is a pre-release of 0.97. See if it works for you. I'll take note about the opt dir. Thanks!

666wisteria commented 7 years ago

When would you be able to release the formal version of 0.97? I was able to install this new version on my end, although our lab bioinfomatician couldn't install v0.97 in our lab node in a cluster, which was weird.

pjotrp commented 7 years ago

Can you provide an error message and other output? It should work on almost all Linux systems.

Release comes after testing by other developers. Give it another week.

pjotrp commented 7 years ago

@666wisteria are you still getting NaNs? Can you send me the output of the cluster install, ask them to use -v --debug and --force flags.

666wisteria commented 7 years ago

Sorry I just saw this today! Our lab bioinformatician is installing GEMMA on cluster. She got this error:

./gemma: /opt/gnu/gcc/lib64/libstdc++.so.6: versionGLIBCXX_3.4.21' not found (required by ./gemma)`

She will clean out the v0.96 install and try install v0.97 from afresh and see if she still got the same error.

I have successfully installed GEMMA on my home directory, and this v0.97 fixed the NaN error. Everything works now! However, I am getting some noisy messages starting with "**** DEBUG". These noisy message would probably be cleared up for the formal v0.97 release, right?

pjotrp commented 7 years ago

Thanks for the report. That error should not happen. Can you send me the result of cat /proc/version on a cluster node? I'll try to replicate it in a VM.

I'll try and fix it next week. If you need to run faster, best have your system support build GEMMA from source to get the latest.

666wisteria commented 7 years ago

This is what I got:

[xiz152@tscc-login1 New_filtering]$ cat /proc/version Linux version 2.6.32-696.10.3.el6.x86_64 (mockbuild@c1bl.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Tue Sep 26 18:14:22 UTC 2017

pjotrp commented 7 years ago
./gemma: /opt/gnu/gcc/lib64/libstdc++.so.6: versionGLIBCXX_3.4.21' not found (required by ./gemma)`

the cluster is probably overriding the shared library path, such as LD_LIBRARY_PATH in some Unix module. Please try again after logging in without a profile. On the host:

    env -i /bin/bash --login --noprofile --norc
    ./gemma
pcarbo commented 7 years ago

@666wisteria The installer worked for me on Linux (CentOS 7). Can you try it again? If it isn't working, please run

echo $LIBRARY_PATH
echo $LD_LIBRARY_PATH

and send us the output. Also, after installing the software, are you running the software on the exact same machine? This glibc error often occurs when a software is built on one operating system and run on a different one.

666wisteria commented 7 years ago

Are we talking about the install error or the ****DEBUG error messages? I don't have the install problems, but the lab bioinformatician had problem installing it on the cluster for the whole lab. I'll ask her about that and reply back.

pcarbo commented 7 years ago

@666wisteria Yes, I'm referring to the outstanding problem that your lab bioinformatician is having. Perhaps she can reply to this thread directly.

pjotrp commented 7 years ago

As this is no longer development, please continue on the mailing list https://groups.google.com/forum/#!forum/gemma-discussion

pcarbo commented 7 years ago

@666wisteria Yes, please have your colleague post to the mailing list---we are migrating all Q&A and discussion (except for bug reports and development-related discussion) to the GEMMA mailing list. Please see:

https://groups.google.com/group/gemma-discussion

(Note it is easy to unsubcribe at any point.)