genetics-statistics / GEMMA

Genome-wide Efficient Mixed Model Association
https://github.com/genetics-statistics/GEMMA
GNU General Public License v3.0
313 stars 122 forks source link

Calcab() is never invoked #94

Open pjotrp opened 6 years ago

pjotrp commented 6 years ago

@prasunanand has discovered that param0->ab is never properly calculated in CalcLambda() which is probably erroneous. Calcab() is never called in the code.

pjotrp commented 6 years ago

Not really a bug because it has no impact on the result.

pjotrp commented 6 years ago

Related:

For pab, it stores all variables in the form of v_a^TP_pv_b. Similarly, ppab stores all v_a^TP_pP_pv_b, while pppab stores all v_a^TP_pP_pP_pv_b. These quantities are defined according to the page 6 of this supplementary information ([1]http://xzlab.org/papers/2012_Zhou&Stephens_NG_SI.pdf). In the code, p, a, b are indexes: when p=n_cvt+1, P_p is P_x as in that supplementary information; when a=n_cvt+1, v_a=x; and when a=n_cvt+2, v_a=y. e_mode determines which model the algorithm is fitting: when e_mode==1, it computes all the above quantities for the alternative model (with the random effects term); when e_mode==0, it compute these quantities for the null model (without the random effects term). These quantities were computed based on the initial GEMMA paper, and the goal is to finally compute y^TP_xy, y^TP_xP_xy, y^TP_xP_xP_xy and the few trace forms (section 3.1.4 on page 5 of the supplementary information). Sometimes I was wondering if we shoud compute all these final quantities directly, instead of through these complicated recursions. Direct computation may only make computation a little slower, but will make the code much easier to follow and easier to modify

pjotrp commented 5 years ago

We appear to have a problem with Calcpab perhaps explaining some of the errors we are getting. After adding NaN checking and matrix range checking we are going out of bounds errors. I expected GSL to show them, but apparently it is not guaranteed.

GEMMA 0.98-pre2 (2018-07-14) by Xiang Zhou and team (C) 2012-2018
Reading Files ... 
**** DEBUG: entered in src/gemma_io.cpp at line 122 in ReadFile_snps
**** DEBUG: ReadFile_anno in src/gemma_io.cpp at line 251 in ReadFile_anno
**** DEBUG: entered in src/gemma_io.cpp at line 355 in ReadFile_pheno
**** TEST MODE: trim individuals from 1940 to 400
**** TEST MODE: trim individuals from 1940 to 400
**** DEBUG: entered in src/gemma_io.cpp at line 612 in ReadFile_geno
## number of total individuals = 400
## number of analyzed individuals = 247
## number of covariates = 1
## number of phenotypes = 1
## leave one chromosome out (LOCO) =        1
## number of total SNPs/var        =    12226
## number of considered SNPS       =     1000
## number of SNPS for K            =    11182
## number of SNPS for GWAS         =     1044
## number of analyzed SNPs         =      889
**** DEBUG: entered in src/gemma_io.cpp at line 1133 in ReadFile_kin
Start Eigen-Decomposition...
**** FAILED: Matrix out of bounds (6,3) 0,3 in src/lmm.cpp at line 245

This is in line https://github.com/genetics-statistics/GEMMA/blob/356f528e52b5cdd13812f64308cbedd87cc0c74f/src/lmm.cpp#L245

To me it looks like it is a good idea to compute these values using a different approach - at least to validate results.