Cloufield / GWASTutorial

GWAS Tutorial for Beginners
https://cloufield.github.io/GWASTutorial/
180 stars 46 forks source link

GCTA-GREML: the X^t * V^-1 * X matrix is not invertible. #7

Open Truongphikt opened 1 month ago

Truongphikt commented 1 month ago

Hi Cloufield team, it was a struggle for a GWAS newbie like me that finally went to the SNP-Heritability estimation by GCTA-GREML section. Now I have been stuck here for a few days in an error with the Estimation step. I guess that I have some wrong with my inputs (see below).

My command is as follows:

awk '{print $1,$2,$5,$6,$7,$8,$9}' projected.sscore > 5PCs.txt

gcta            --grm 1kg_eas         \
                --pheno 1kgeas_binary.phen         \
                --prevalence 0.5         \
                --qcovar  projected.sscore        \
                --reml         \
                --out 1kg_eas         \
                --thread-num 1

Needed inputs: inputs.zip

And I got an error:

Error

```console ******************************************************************** Genome-wide Complex Trait Analysis (GCTA) * version v1.94.1 Linux* Built at Nov 15 2022 21:14:25, by GCC 8.5 * (C) 2010-present, Yang Lab, Westlake University* Please report bugs to Jian Yang ******************************************************************** Analysis started at 03:39:30 UTC on Sat Oct 19 2024. Hostname: f9ed864a67df Accepted options: --grm 1kg_eas --pheno 1kgeas_binary.phen --prevalence 0.5 --qcovar projected.sscore --reml --out 1kg_eas --thread-num 1 Note: This is a multi-thread program. If your machine has multiple processors, you can specify the number of threads with the—-thread-num option to speed up the computation. Reading IDs of the GRM from [1kg_eas.grm.id]. 504 IDs are read from [1kg_eas.grm.id].Reading the GRM from [1kg_eas.grm.bin]. GRM for 504 individuals are included from [1kg_eas.grm.bin]. Reading phenotypes from [1kgeas_binary.phen]. Non-missing phenotypes of 502 individuals are included from [1kgeas_binary.phen]. Reading quantitative covariate(s) from [projected.sscore]. 12 quantitative covariate(s) of 501 individuals are included from [projected.sscore]. Assuming a disease phenotype for a case-control study: 248 cases and 250 controls 12 quantitative variable(s) included as covariate(s). 498 individuals are in common in these files. Performing REML analysis ... (Note: may take hours depending on sample size). 498 observations, 13 fixed effect(s), and 2 variance component(s)(including residual variance). Calculating prior values of variance components by EM-REML ... Updated prior values: 5.24546e+19 3.07774e+21 logL: 1.6403e+14Running AI-REML algorithm ... Iter. logL V(G) V(e) Error: the X^t * V^-1 * X matrix is not invertible. Please check the covariate(s) and/or the environmental factor(s). An error occurs, please check the options or data ```

Could someone tell me where was I wrong? Thanks.

Cloufield commented 3 weeks ago

Hi, Sorry for the late reply. I updated this section and replaced the outdated paths. It should work now. For the error, please see #15 in GCTA FAQ. It usually indicates that the inverse of variance-covariance matrix does not exist or there is something wrong with the GRM.

Truongphi20 commented 2 weeks ago

@Cloufield Thank you so much. Based on your new updates, I have successfully run this step. My update command:

awk '{print $1,$2,$5,$6,$7,$8,$9}' projected.sscore > 5PCs.txt
gcta         --grm 1kg_eas         \
                --pheno 1kgeas_binary.phen         \
                --prevalence 0.5         \
                --qcovar  5PCs.txt        \
                --reml         \
                --out 1kg_eas

My input files: inputs.zip