RGLab / MAST

Tools and methods for analysis of single cell assay data in R
224 stars 57 forks source link

Error in validObject(.Object): invalid class “corMatrix” object: 'sd' slot has nonfinite elements #180

Open derrik-gratz opened 11 months ago

derrik-gratz commented 11 months ago

Otherwise I am trying to run MAST with a full single-cell dataset of multiple identified CD8 cell subtypes, one condition with two factors, and two timepoints. I am trying to run everything in one model since all the CD8s are being processed together, but perhaps it would be better to split by celltype, although I ran a similar model successfully for combined CD4 subtypes. See the chunk below for my model formula

zlmCond <- zlm(~ cdr +    ##cngeneson
                 cluster_labels +    ##celltype (naive, effector memory, etc)
                 condition +   ## factor with 2 levels
                 timepoint +   ## factor with 2 levels
                 condition:timepoint + 
                 cluster_labels:condition +
                 cluster_labels:timepoint +
                 cluster_labels:condition:timepoint +
                 (1|hash.ID),
              obj.sce,
              method = 'glmer',
              ebayes = FALSE,
              exprs_values = 'logcounts',
              fitArgsD = list(nAGQ=0),
              parallel = TRUE,
              silent = FALSE)

The model starts running but fails ~8% of the way through with this error:

Error in validObject(.Object) : 
  invalid class “corMatrix” object: 'sd' slot has nonfinite elements

Again I was able to run this model formula successfully on a CD4 subset from the same dataset. Any idea what might be causing this?

amcdavid commented 11 months ago

What does traceback() say? This looks like it might be originating from glmer.

On Mon, Sep 18, 2023 at 10:59 AM Derrik Gratz @.***> wrote:

Otherwise I am trying to run MAST with a full single-cell dataset of multiple identified CD8 cell subtypes, one condition with two factors, and two timepoints. I am trying to run everything in one model since all the CD8s are being processed together, but perhaps it would be better to split by celltype, although I ran a similar model successfully for combined CD4 subtypes. See the chunk below for my model formula

zlmCond <- zlm(~ cdr + cluster_labels + condition + timepoint + condition:timepoint + cluster_labels:condition + cluster_labels:timepoint + cluster_labels:condition:timepoint + (1|hash.ID), obj.sce, method = 'glmer', ebayes = FALSE, exprs_values = 'logcounts', fitArgsD = list(nAGQ=0), parallel = TRUE, silent = FALSE)

The model starts running but fails ~8% of the way through with this error:

Error in validObject(.Object) : invalid class “corMatrix” object: 'sd' slot has nonfinite elements

Again I was able to run this model formula successfully on a CD4 subset from the same dataset. Any idea what might be causing this?

— Reply to this email directly, view it on GitHub https://github.com/RGLab/MAST/issues/180, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALLAHX3KXHJFMULQF3DX5TX3CDY7ANCNFSM6AAAAAA45DDB4I . You are receiving this because you are subscribed to this thread.Message ID: @.***>

derrik-gratz commented 11 months ago
image
SessionInfo ``` R version 4.2.1 (2022-06-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] EnhancedVolcano_1.16.0 ggrepel_0.9.1 msigdbr_7.5.1 MAST_1.24.1 SingleCellExperiment_1.20.1 [6] SummarizedExperiment_1.28.0 Biobase_2.58.0 GenomicRanges_1.50.2 GenomeInfoDb_1.34.9 IRanges_2.32.0 [11] S4Vectors_0.36.2 BiocGenerics_0.44.0 MatrixGenerics_1.10.0 matrixStats_0.62.0 DT_0.26 [16] gencoreSC_0.1.0 here_1.0.1 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 [21] purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.3.6 [26] tidyverse_1.3.2 sp_1.5-0 SeuratObject_4.1.2 Seurat_4.2.0 loaded via a namespace (and not attached): [1] scattermore_0.8 ModelMetrics_1.2.2.2 bit64_4.0.5 knitr_1.40 irlba_2.3.5.1 [6] DelayedArray_0.24.0 data.table_1.14.4 rpart_4.1.16 doParallel_1.0.17 KEGGREST_1.38.0 [11] hardhat_1.2.0 RCurl_1.98-1.9 generics_0.1.3 ScaledMatrix_1.6.0 cowplot_1.1.1 [16] RSQLite_2.2.18 RANN_2.6.1 future_1.28.0 bit_4.0.4 tzdb_0.3.0 [21] spatstat.data_3.0-0 xml2_1.3.3 lubridate_1.8.0 httpuv_1.6.6 assertthat_0.2.1 [26] viridis_0.6.2 gargle_1.2.1 gower_1.0.0 xfun_0.34 hms_1.1.2 [31] babelgene_22.9 evaluate_0.17 promises_1.2.0.1 progress_1.2.2 fansi_1.0.3 [36] dbplyr_2.2.1 readxl_1.4.1 geneplotter_1.76.0 igraph_1.3.5 DBI_1.1.3 [41] htmlwidgets_1.5.4 spatstat.geom_3.0-3 googledrive_2.0.0 ellipsis_0.3.2 backports_1.4.1 [46] annotate_1.76.0 deldir_1.0-6 sparseMatrixStats_1.10.0 vctrs_0.5.0 ROCR_1.0-11 [51] abind_1.4-5 cachem_1.0.6 caret_6.0-93 withr_2.5.0 ggforce_0.4.1 [56] progressr_0.11.0 sctransform_0.3.5 prettyunits_1.1.1 goftest_1.2-3 cluster_2.1.3 [61] lazyeval_0.2.2 crayon_1.5.2 labeling_0.4.2 recipes_1.0.2 pkgconfig_2.0.3 [66] tweenr_2.0.2 nlme_3.1-157 vipor_0.4.5 nnet_7.3-17 rlang_1.0.6 [71] globals_0.16.1 lifecycle_1.0.3 miniUI_0.1.1.1 clustree_0.5.0 gencoreBulk_0.1 [76] modelr_0.1.9 rsvd_1.0.5 cellranger_1.1.0 rprojroot_2.0.3 polyclip_1.10-4 [81] lmtest_0.9-40 Matrix_1.5-1 boot_1.3-28 zoo_1.8-11 reprex_2.0.2 [86] beeswarm_0.4.0 GlobalOptions_0.1.2 SingleR_2.0.0 ggridges_0.5.4 googlesheets4_1.0.1 [91] rjson_0.2.21 png_0.1-7 viridisLite_0.4.1 bitops_1.0-7 pROC_1.18.0 [96] KernSmooth_2.23-20 Biostrings_2.66.0 blob_1.2.3 DelayedMatrixStats_1.20.0 shape_1.4.6 [101] parallelly_1.32.1 spatstat.random_2.2-0 beachmat_2.14.2 scales_1.2.1 memoise_2.0.1 [106] magrittr_2.0.3 plyr_1.8.7 ica_1.0-3 zlibbioc_1.44.0 compiler_4.2.1 [111] RColorBrewer_1.1-3 clue_0.3-62 lme4_1.1-30 DESeq2_1.38.3 fitdistrplus_1.1-8 [116] cli_3.4.1 XVector_0.38.0 listenv_0.8.0 patchwork_1.1.2 pbapply_1.5-0 [121] MASS_7.3-57 mgcv_1.8-40 tidyselect_1.2.0 stringi_1.7.8 yaml_2.3.6 [126] locfit_1.5-9.6 BiocSingular_1.14.0 grid_4.2.1 fastmatch_1.1-3 tools_4.2.1 [131] future.apply_1.9.1 parallel_4.2.1 circlize_0.4.15 rstudioapi_0.14 foreach_1.5.2 [136] gridExtra_2.3 prodlim_2019.11.13 farver_2.1.1 Rtsne_0.16 ggraph_2.1.0 [141] digest_0.6.30 rgeos_0.5-9 lava_1.7.0 shiny_1.7.3 Rcpp_1.0.9 [146] broom_1.0.1 scuttle_1.8.4 later_1.3.0 RcppAnnoy_0.0.20 AnnotationDbi_1.60.2 [151] httr_1.4.4 ComplexHeatmap_2.14.0 SoupX_1.6.1 colorspace_2.0-3 XML_3.99-0.11 [156] rvest_1.0.3 fs_1.5.2 tensor_1.5 reticulate_1.26 splines_4.2.1 [161] uwot_0.1.14 spatstat.utils_3.0-1 scater_1.26.1 graphlayouts_0.8.3 renv_1.0.2 [166] plotly_4.10.0 xtable_1.8-4 jsonlite_1.8.3 nloptr_2.0.3 tidygraph_1.2.2 [171] timeDate_4021.106 zeallot_0.1.0 ipred_0.9-13 R6_2.5.1 pillar_1.8.1 [176] htmltools_0.5.3 mime_0.12 glue_1.6.2 fastmap_1.1.0 minqa_1.2.5 [181] BiocParallel_1.32.6 BiocNeighbors_1.16.0 class_7.3-20 codetools_0.2-18 fgsea_1.24.0 [186] utf8_1.2.2 lattice_0.20-45 spatstat.sparse_3.0-0 ggbeeswarm_0.6.0 leiden_0.4.3 [191] gtools_3.9.3 zip_2.2.2 openxlsx_4.2.5.1 DoubletFinder_2.0.3 limma_3.54.2 [196] survival_3.3-1 rmarkdown_2.17 munsell_0.5.0 GetoptLong_1.0.5 GenomeInfoDbData_1.2.9 [201] iterators_1.0.14 haven_2.5.1 reshape2_1.4.4 gtable_0.3.1 spatstat.core_2.4-4 ```
derrik-gratz commented 11 months ago

I subsetted the data by celltype to allow for a simpler design. The issue still occurs with the following model on one portion of my data with 1061 cells

zlm(~ cdr +
                 condition +
                 timepoint +
                 condition:timepoint + 
                 (1|hash.ID),
              obj.sce_sub,
              method = 'glmer',
              ebayes = FALSE,
              exprs_values = 'logcounts',
              fitArgsD = list(nAGQ=0),
              parallel = TRUE,
              silent = TRUE)
amcdavid commented 11 months ago

Thanks. This actually looks like a bug in lmer/glmer in that it managed to fit the model but somehow was left in an undefined state. Unless/until lme4 fixes this, I can add another test to try to trap this...

derrik-gratz commented 11 months ago

I tried subsetting to the problem cells and running it without fitArgsD = list(nAGQ=0) and nothing changed. Not expecting a response based on that, just documenting other info that might be of interest

derrik-gratz commented 11 months ago

I pinned down a possible cause: I wasn't properly filtering out genes with low expression threshold. The problem gene in this case had 3(!) cells expressing it. When I remove that gene the model runs fine.

derrik-gratz commented 11 months ago

Even running the more complex model (not subset for celltypes) worked when removing genes expressed in less than 100 cells. So this issue may only arise if you don't follow the preprocessing steps correctly. I'm fine for this to be closed, unless you want to leave it open to prompt a patch on your end