Open JaehyunParkBiostat opened 1 month ago
Update: I got a response from Dr. Joelle Mbatchou, the author of REGENIE, regarding this issue; she told me that the predictors with and without chrX should be different since a single joint model is first fitted across all chromosomes (and then zero out each chromosome to calculate LOCO predictions), but the predictions should be highly correlated.
Since the first step is to get the sample relatedness and population structure, she also said that this step can be done only with autosomes (although I personally prefer including all available chromosomes).
Hello, I would like to share an experience and a mistake I made during running the first step of REGENIE.
I ran REGENIE with the following code:
After some time, this code stopped running with the following error:
(If you did not face the error, it is totally fine to go to the next step)
The error message indicated that the variant was perfectly correlated with the covariates (PC1~10 & age), which was very unlikely, or the minor allele count of the variant was zero or near zero. This variant, on chromosome X, had the MAF between 3% and 5% in the array data. After removing the variant (with
--exclude
option), I faced the same error with another variant on chrX.The problem was: even though the variants had the non-rare frequencies 'in the array data of all participants,' it was still possible that these variants did not exist in a specific group, males with African ancestry in this case. REGENIE does not detect those variants beforehand, so we need to make a list of such variants and exclude them.
The solution was using plink to make a list of variants with non-zero MACs in the group and provide the list to REGENIE. (It is also explained in the FAQ page of REGENIE: see https://rgcgithub.github.io/regenie/faq/) Below is the code:
We can set the
--mac
option in plink not to be 1, but since the first step of REGENIE includes adjusting the sample relatedness, I would recommend including all the variants with non-zero counts in this step. Also, it is not recommended to exclude the whole chromosome (chrX in this case) causing the problem; although the predictors from Step 1 are calculated in chromosome-wise manner, the calculation includes leave-one-chromosome-out (LOCO) and cross-validation procedure, and the result can be different by the chromosomes included in the step. For accurate results, I would recommend using all variants with non-zero counts.I hope this would be helpful to other people in this analysis. Thank you.