RGLab / MAST

Tools and methods for analysis of single cell assay data in R
226 stars 57 forks source link

Residual hook causes NAs #191

Open JThomasWatson opened 2 weeks ago

JThomasWatson commented 2 weeks ago

If you have a question about how to use MAST, or how to interpret the output please submit a question to the bioconductor support site so that others can benefit from the discussion. Please do not use github issues.

Otherwise Please briefly describe your problem and what output you expect.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

Brief description of the problem

I'd like to use MAST with sample ID as a random effect. When I include anything in the hook parameter as in section 4.2 of the MAIT tutorial, the ZlmFit object becomes full of NAs and no residuals are accessible. In the summary table created by msummary <- summary(zlmResidDE, doLRT = lrt_name, parallel = TRUE), all fold change values are NaN while everything else is NA. Below is the code I use to attempt a zlm model with residual hook.

zlm_group <- zlm(~ group + cdr_centered + age_centered + pool + sex + (1 | sample_id), 
                       sca, ebayes = FALSE, method = "glmer", hook = combined_residuals_hook,
                       parallel = TRUE, fitArgsD = list(nAGQ = 0))

lrt_name <- paste0("group", ident.2)
msummary <- summary(zlm_group, doLRT = lrt_name, parallel = TRUE)

And the produced ZlmFit object looks like this:

image

Where most values are NA, and the hookOut slot is NULL for each gene.

When I don't include hook, the ZlmFit object is populated with data, and the summary table builds properly, but no residuals are stored. The code used to build the model without residual hook is:

zlm_group <- zlm(~ group + cdr_centered + age_centered + pool + sex + (1 | sample_id), 
                       sca, ebayes = FALSE, method = "glmer",
                       parallel = TRUE, fitArgsD = list(nAGQ = 0))

lrt_name <- paste0("group", ident.2)
msummary <- summary(zlm_group, doLRT = lrt_name, parallel = TRUE)

Viewing the ZlmFit object created by that code shows this:

image

I attempted to run the residual hook function line by line to better understand what's happening, and I noticed warnings after the commands class(x@fitC) <- c("glm","lm") and class(x@fitD) <- c("bayesglm","glm","lm") that said these objects would no longer be S4 objects. I wondered if that class change was breaking a subsequent operation that expected an S4 object, so I attempted to run a modified version by cloning this repository, building and loading the package locally, and redefining combined_residuals_hook() without those two class change commands, but that doesn't appear to have solved it, as including the hook parameter still introduces NAs and doesn't store residuals. Below is my output from sessionInfo():

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)

Matrix products: default
BLAS:   /hpc/software/spack_v20d1/spack/opt/spack/linux-rhel7-x86_64/gcc-12.3.0/r-4.4.0-aaqsqjfnzw5p5c4twtddj4sxqi23pyfw/rlib/R/lib/libRblas.so 
LAPACK: /hpc/software/spack_v20d1/spack/opt/spack/linux-rhel7-x86_64/gcc-12.3.0/r-4.4.0-aaqsqjfnzw5p5c4twtddj4sxqi23pyfw/rlib/R/lib/libRlapack.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Chicago
tzcode source: system (glibc)

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] NMF_0.28                    cluster_2.1.6               rngtools_1.5.2              registry_0.5-1             
 [5] MAST_1.27.1                 SingleCellExperiment_1.24.0 gridExtra_2.3               fastDummies_1.7.4          
 [9] ppcor_1.1                   MASS_7.3-61                 DescTools_0.99.57           knitr_1.48                 
[13] aricode_1.0.3               e1071_1.7-16                car_3.1-3                   carData_3.0-5              
[17] viridis_0.6.5               viridisLite_0.4.2           ggdark_0.2.1                ggsci_3.2.0                
[21] cowplot_1.1.3               ggthemes_5.1.0              plotly_4.10.4               rsvd_1.0.5                 
[25] ggrepel_0.9.6               ComplexHeatmap_2.21.1       SeuratData_0.2.2.9001       sparseMatrixStats_1.16.0   
[29] edgeR_4.2.1                 limma_3.60.6                DESeq2_1.44.0               SummarizedExperiment_1.34.0
[33] Biobase_2.64.0              MatrixGenerics_1.16.0       matrixStats_1.4.1           GenomicRanges_1.56.1       
[37] GenomeInfoDb_1.40.1         IRanges_2.38.1              S4Vectors_0.42.1            BiocGenerics_0.50.0        
[41] DElegate_1.2.1              UpSetR_1.4.0                doMC_1.3.8                  iterators_1.0.14           
[45] foreach_1.5.2               rlist_0.4.6.2               Seurat_5.1.0                SeuratObject_5.0.2         
[49] sp_2.1-4                    lubridate_1.9.3             forcats_1.0.0               stringr_1.5.1              
[53] dplyr_1.1.4                 purrr_1.0.2                 readr_2.1.5                 tidyr_1.3.1                
[57] tibble_3.2.1                ggplot2_3.5.1               tidyverse_2.0.0             data.table_1.16.0          
[61] plyr_1.8.9                 

loaded via a namespace (and not attached):
  [1] spatstat.sparse_3.1-0   httr_1.4.7              RColorBrewer_1.1-3      doParallel_1.0.17       tools_4.4.0            
  [6] sctransform_0.4.1       utf8_1.2.4              R6_2.5.1                lazyeval_0.2.2          uwot_0.2.2             
 [11] GetoptLong_1.0.5        withr_3.0.1             prettyunits_1.2.0       progressr_0.14.0        cli_3.6.3              
 [16] spatstat.explore_3.3-2  labeling_0.4.3          mvtnorm_1.3-1           spatstat.data_3.1-2     proxy_0.4-27           
 [21] ggridges_0.5.6          pbapply_1.7-2           parallelly_1.38.0       readxl_1.4.3            rstudioapi_0.16.0      
 [26] generics_0.1.3          shape_1.4.6.1           ica_1.0-3               spatstat.random_3.3-2   Matrix_1.7-0           
 [31] fansi_1.0.6             abind_1.4-8             lifecycle_1.0.4         SparseArray_1.4.8       Rtsne_0.17             
 [36] promises_1.3.0          crayon_1.5.3            miniUI_0.1.1.1          lattice_0.22-6          pillar_1.9.0           
 [41] rjson_0.2.23            boot_1.3-31             gld_2.6.6               future.apply_1.11.2     codetools_0.2-20       
 [46] leiden_0.4.3.1          glue_1.8.0              spatstat.univar_3.0-1   vctrs_0.6.5             png_0.1-8              
 [51] spam_2.11-0             cellranger_1.1.0        gtable_0.3.5            xfun_0.48               S4Arrays_1.4.1         
 [56] mime_0.12               survival_3.7-0          statmod_1.5.0           fitdistrplus_1.2-1      ROCR_1.0-11            
 [61] nlme_3.1-166            progress_1.2.3          RcppAnnoy_0.0.22        irlba_2.3.5.1           KernSmooth_2.23-24     
 [66] colorspace_2.1-1        Exact_3.3               tidyselect_1.2.1        compiler_4.4.0          expm_1.0-0             
 [71] DelayedArray_0.30.1     scales_1.3.0            lmtest_0.9-40           rappdirs_0.3.3          digest_0.6.37          
 [76] goftest_1.2-3           spatstat.utils_3.1-0    minqa_1.2.8             XVector_0.44.0          htmltools_0.5.8.1      
 [81] pkgconfig_2.0.3         lme4_1.1-35.5           fastmap_1.2.0           rlang_1.1.4             GlobalOptions_0.1.2    
 [86] htmlwidgets_1.6.4       UCSC.utils_1.0.0        shiny_1.9.1             farver_2.1.2            zoo_1.8-12             
 [91] jsonlite_1.8.9          BiocParallel_1.38.0     magrittr_2.0.3          Formula_1.2-5           GenomeInfoDbData_1.2.12
 [96] dotCall64_1.2           patchwork_1.3.0         munsell_0.5.1           Rcpp_1.0.13             reticulate_1.39.0      
[101] stringi_1.8.4           rootSolve_1.8.2.4       zlibbioc_1.50.0         listenv_0.9.1           lmom_3.2               
[106] deldir_2.0-4            splines_4.4.0           tensor_1.5              hms_1.1.3               circlize_0.4.16        
[111] locfit_1.5-9.10         igraph_2.0.3            spatstat.geom_3.3-3     RcppHNSW_0.6.0          reshape2_1.4.4         
[116] BiocManager_1.30.25     nloptr_2.1.1            tzdb_0.4.0              httpuv_1.6.15           RANN_2.6.2             
[121] polyclip_1.10-7         future_1.34.0           clue_0.3-65             scattermore_1.2         gridBase_0.4-7         
[126] xtable_1.8-4            RSpectra_0.16-2         later_1.3.2             class_7.3-22            timechange_0.3.0       
[131] globals_0.16.3
JThomasWatson commented 4 hours ago

@gfinak @amcdavid Have you seen this behavior before?