genetics-statistics / GEMMA

Genome-wide Efficient Mixed Model Association
https://github.com/genetics-statistics/GEMMA
GNU General Public License v3.0
330 stars 124 forks source link

LMM output missing something? #237

Closed pjotrp closed 3 years ago

pjotrp commented 3 years ago

Describe the bug

Running -lmm [1-3] gives the following output:

time ./bin/gemma-0.98.3-linux-static -lmm 1 -bfile example/mouse_hs1940 -k output/mouse_hs1940_data.cXX.txt -o lmm1

head -3 lmm*assoc*
==> lmm1.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      beta    se      logl_H1 l_remle p_wald
1       rs3683945       3197400 0       A       G       0.443   -7.965330e-02   6.185099e-02    -1.582126e+03   4.318993e+00   1.980182e-01
1       rs3707673       3407393 0       G       A       0.443   -6.654282e-02   6.210234e-02    -1.582377e+03   4.316144e+00   2.841271e-01

==> lmm2.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      logl_H1 l_mle   p_lrt
1       rs3683945       3197400 0       A       G       0.443   -1.583892e+03   4.313353e+00    1.977836e-01
1       rs3707673       3407393 0       G       A       0.443   -1.584148e+03   4.310280e+00    2.840289e-01

==> lmm3.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      beta    se      logl_H1 p_score
1       rs3683945       3197400 0       A       G       0.443   -7.965744e-02   6.191532e-02    0.000000e+00    1.984062e-01
1       rs3707673       3407393 0       G       A       0.443   -6.650240e-02   6.217649e-02    0.000000e+00    2.848479e-01

At least one expects lmm3 to show all 3 calculations! Probably a regression on my end.

xiangzhou commented 3 years ago

I thought lmm 1 is the wald test, 2 is the likelihood ratio test, 3 is the score test, and perhaps 4 is all three tests?

pjotrp commented 3 years ago

Yes, you are right

 -lmm      [num]          specify analysis options (default 1).
          options: 1: Wald test
                   2: Likelihood ratio test
                   3: Score test
                   4: 1-3
                   5: Parameter estimation in the null model only

That column of zeroes should not be there :)

pjotrp commented 3 years ago

If I run the 4 options with

for x in 1 2 3 4 ; do ./bin/gemma-0.98.3-linux-static -lmm $x -bfile example/mouse_hs1940 -k output/mouse_hs1940_data.cXX.txt -o lmm$x

the result is

head -3 output/lmm?.assoc.txt 
==> output/lmm1.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      beta    se      logl_H1 l_remle p_wald
1       rs3683945       3197400 0       A       G       0.443   -7.965330e-02   6.185099e-02    -1.582126e+03      4.318993e+00    1.980182e-01
1       rs3707673       3407393 0       G       A       0.443   -6.654282e-02   6.210234e-02    -1.582377e+03      4.316144e+00    2.841271e-01

==> output/lmm2.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      logl_H1 l_mle   p_lrt
1       rs3683945       3197400 0       A       G       0.443   -1.583892e+03   4.313353e+00    1.977836e-01
1       rs3707673       3407393 0       G       A       0.443   -1.584148e+03   4.310280e+00    2.840289e-01

==> output/lmm3.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      beta    se      logl_H1 p_score
1       rs3683945       3197400 0       A       G       0.443   -7.965744e-02   6.191532e-02    0.000000e+00       1.984062e-01
1       rs3707673       3407393 0       G       A       0.443   -6.650240e-02   6.217649e-02    0.000000e+00       2.848479e-01

==> output/lmm4.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      beta    se      logl_H1 l_remle l_mle   p_wald  p_lrt      p_score
1       rs3683945       3197400 0       A       G       0.443   -7.965330e-02   6.185099e-02    -1.583892e+03      4.318993e+00    4.313353e+00    1.980182e-01    1.977836e-01    1.984062e-01
1       rs3707673       3407393 0       G       A       0.443   -6.654282e-02   6.210234e-02    -1.584148e+03      4.316144e+00    4.310280e+00    2.841271e-01    2.840289e-01    2.848479e-01

you can see logl_H1 differs for lmm 2 and is not computed with lmm 3. It was code introduced in gemma 0.95alpha. See https://github.com/genetics-statistics/GEMMA/blame/84360c191f418bf8682b35e0c8235fcc3bd19a06/src/lmm.cpp. I added the logl_H1 output about 3 years ago with #92

pjotrp commented 3 years ago

I also note the time to compute varies quite a bit for the lmm 1-4 versions. lmm 4 being the worst for doing it all:

for x in 1 2 3 4 ; do time ./bin/gemma-0.98.3-linux-static -lmm $x -bfile example/mouse_hs1940 -k output/mouse_hs1940_data.cXX.txt -o lmm$x ; done
real    0m9.939s
user    0m14.895s
sys     0m2.632s

real    0m9.939s
user    0m15.030s
sys     0m2.255s

real    0m2.837s
user    0m8.103s
sys     0m3.003s

real    0m17.482s
user    0m22.239s
sys     0m2.422s
pjotrp commented 3 years ago

Anyway, from the code -lmm 3 does not compute logl_H1. I'll remove that from the output.