AbbVie-ComputationalGenomics / SAIGEgds

Scalable Implementation of generalized mixed models using GDS files in Phenome-Wide Association Studies
7 stars 5 forks source link

Error: glm.fit: fitted probabilities numerically 0 or 1 occurred #6

Closed complexgenome closed 3 years ago

complexgenome commented 3 years ago

I am trying to fit NULL model using GRM generated seq-array from genotyped data. I have binary phenotype (0/1) for ~6K individuals. When I try to fir null model I get error for probabilities numerically. Below is the code:

library(SeqArray)
library(SAIGEgds)

pheno <-read.table("phenotype_04.28.2021.txt", header=TRUE)

glmm <- seqFitNullGLMM_SPA(AD ~ SEX+ AGE + PC1 +PC2+ PC3 , pheno, "seqarray_grm_geno.gds" , trait.type="binary",sample.col="TOPMED_IID", num.thread=10)

Error:

    Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Invalid tbb::this_task_arena::current_thread_index()!
Aborted (core dumped)

Is this related to input data type or low memory?

zhengxw-ab commented 3 years ago

The error "Invalid tbb::this_task_arena::current_thread_index()" is related to the latest update of RcppParallel. You should install RcppParallel_5.1.2 and SAIGEgds_1.5.3 http://bioconductor.org/packages/3.13/bioc/src/contrib/SAIGEgds_1.5.3.tar.gz

If you still see "glm.fit: fitted probabilities numerically 0 or 1 occurred", please check the distribution of sex vs AD, or age vs AD if all AD patients are all elder than controls.

complexgenome commented 3 years ago

I have following libraries:

 SAIGEgds_1.4.0  Rcpp_1.0.6      SeqArray_1.30.0 gdsfmt_1.26.1

R version 4.0.0 (2020-04-24)

Platform: x86_64-conda_cos6-linux-gnu (64-bit)

Running under: Debian GNU/Linux 9 (stretch)

I used BiocManager::install("SAIGEgds") to get saigeGDS. I am using R in CONDA environment. How do I get SAIGEgds_1.5.3 ?

Also, I tried to update rcpp, it sticks to v1.0.6

Thank you

complexgenome commented 3 years ago

Following up on your reply:

The AD patients are elder (mean age = 77) than the controls (mean age = 71).

May I know how the GRM is being used? I ask these because GWAS will be conducted using imputed data from TOPMED that has genome assembly of GRCH38. The genotype GRM is on grch37.

Thank you for your help.

zhengxw-ab commented 3 years ago

See http://www.bioconductor.org/packages/devel/bioc/html/SAIGEgds.html:

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("SAIGEgds")
zhengxw-ab commented 3 years ago

The position in seqarray_grm_geno.gds is not used in fitting the model, so it is OK to use grch37 GRM gds.

complexgenome commented 3 years ago

See http://www.bioconductor.org/packages/devel/bioc/html/SAIGEgds.html:

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("SAIGEgds")

Sorry, this is getting difficult

 BiocManager::install(version='devel')
Error: Bioconductor version '3.13' requires R version '4.1'; use
  `BiocManager::install(version = '3.12')` with R version 4.0; see
  https://bioconductor.org/install

Either I update the R (to v4.1 using conda) with entire set of dependencies or there is another way? Sorry for long exchanges and troubleshooting this.

complexgenome commented 3 years ago

The position in seqarray_grm_geno.gds is not used in fitting the model, so it is OK to use grch37 GRM gds.

The SNP names will be different as well. Genotype data has SNPs in rs format, whereas, imputed from michigan as CHR22:100:A:T

Thank you very much.

zhengxw-ab commented 3 years ago

The position in seqarray_grm_geno.gds is not used in fitting the model, so it is OK to use grch37 GRM gds.

The SNP names will be different as well. Genotype data has SNPs in rs format, whereas, imputed from michigan as CHR22:100:A:T

Thank you very much.

It is OK with different naming.

zhengxw-ab commented 3 years ago

See http://www.bioconductor.org/packages/devel/bioc/html/SAIGEgds.html:

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("SAIGEgds")

Sorry, this is getting difficult

 BiocManager::install(version='devel')
Error: Bioconductor version '3.13' requires R version '4.1'; use
  `BiocManager::install(version = '3.12')` with R version 4.0; see
  https://bioconductor.org/install

Either I update the R (to v4.1 using conda) with entire set of dependencies or there is another way? Sorry for long exchanges and troubleshooting this.

Download http://www.bioconductor.org/packages/devel/bioc/src/contrib/SAIGEgds_1.5.3.tar.gz SAIGEgds_1.5.3 can run on R_4.0.

wget http://www.bioconductor.org/packages/devel/bioc/src/contrib/SAIGEgds_1.5.3.tar.gz
R CMD INSTALL SAIGEgds_1.5.3.tar.gz
complexgenome commented 3 years ago

I reinstalled other package due to version/dependency


wget http://www.bioconductor.org/packages/devel/bioc/src/contrib/SAIGEgds_1.5.3.tar.gz

wget "http://www.bioconductor.org/packages/devel/bioc/src/contrib/SeqArray_1.31.11.tar.gz"

R CMD INSTALL SeqArray_1.31.11.tar.gz  -l ./
R CMD INSTALL SAIGEgds_1.5.3.tar.gz -l ./
privefl commented 2 years ago

I get the same error Invalid tbb::this_task_arena::current_thread_index()!. I have RcppParallel v5.1.4. http://www.bioconductor.org/packages/devel/bioc/src/contrib/SAIGEgds_1.5.3.tar.gz is not found for me. For some reason, I cannot install more than SAIGEgds v1.0.2.

zhengxw-ab commented 2 years ago

See SAIGEgds_1.6.0: http://www.bioconductor.org/packages/release/bioc/html/SAIGEgds.html

privefl commented 2 years ago

Yes, but running BiocManager::install("SAIGEgds") won't get me more than v1.0.2. If I try to use install.packages("http://www.bioconductor.org/packages/release/bioc/src/contrib/SAIGEgds_1.6.0.tar.gz", repos = NULL), it tells me I need a newer version of SeqArray. If I do that for SeqArray, I get an error about GDS_Node_Load not found when compiling.

zhengxw-ab commented 2 years ago

Update your R to v4.1, or update the dependences one by one. You need to install the new gdsfmt.

privefl commented 2 years ago

I can't update to R v4.1 on the cluster, unfortunately.