GabrielHoffman / dreamlet

Perform differential expression analysis on multi-sample single cell datasets using linear mixed models
https://gabrielhoffman.github.io/dreamlet
20 stars 4 forks source link

Model Coefficients not estimable problem #19

Closed gjones3339 closed 2 months ago

gjones3339 commented 4 months ago

Hello, thank you for all your work in providing these great analysis tools.

I am trying to run dreamlet on a merged seurat object which has undergone cca integration. I have aggregated to pseudobulk, splitting by 3 pseudotime bins and am attempting to processAssays/examine variance:

form <-  ~ Condition + Donor + nCount_RNA + log1pcounts + mitoPercent + nFeature_RNA 
res.proc <- processAssays(pb, form, min.cells = 5,
                                      min.count = 5,
                                      min.samples = 4,
                                      min.prop = 0.4, 
                                      verbose = TRUE
                          )
details(res.proc)

However, I get errors saying certain coefficients not estimable, and its always the last two coefficients in my model form, regardless of what they are. For example: form <- ~ Condition + Donor + nCount_RNA + log1pcounts + mitoPercent + nFeature_RNA will show:

 Pseudo1...Coefficients not estimable: mitoPercent nFeature_RNA 
Warning: Partial NA coefficients for 15792 probe(s)
Warning in voomWithDreamWeights(y[keep, ], formula, data, weights = precWeights,  :
  The experimental design has no replication. Setting weights to 1.
0.41 secs

OR form <- ~ Condition + Donor + mitoPercent + nFeature_RNA + nCount_RNA + log1pcounts will give the error:

 Pseudo1...Coefficients not estimable: nCount_RNA + log1pcounts
Warning: Partial NA coefficients for 15792 probe(s)
Warning in voomWithDreamWeights(y[keep, ], formula, data, weights = precWeights,  :
  The experimental design has no replication. Setting weights to 1.
0.41 secs

I am confused as to whether its simply not listing more things? or if there is some actual error with my model form? Regardless -- whenever I get these warnings it won't allow me to plot variance for any of these covariates.

Thanks in advance for your help

GabrielHoffman commented 4 months ago

Can you send ‘sessionInfo()’

gjones3339 commented 4 months ago

oops, sorry about that:

> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] monocle3_1.3.7              scater_1.32.0               ExperimentHub_2.12.0        AnnotationHub_3.12.0       
 [5] BiocFileCache_2.12.0        dbplyr_2.5.0                muscat_1.18.0               scuttle_1.14.0             
 [9] Matrix_1.7-0                scDblFinder_1.18.0          anndata_0.7.5.6             cowplot_1.1.3              
[13] harmony_1.2.0               Rcpp_1.0.12                 gridExtra_2.3               lubridate_1.9.3            
[17] forcats_1.0.0               purrr_1.0.2                 tidyr_1.3.1                 tibble_3.2.1               
[21] tidyverse_2.0.0             magrittr_2.0.3              RColorBrewer_1.1-3          outliers_0.15              
[25] stringr_1.5.1               dplyr_1.1.4                 readr_2.1.5                 SeuratWrappers_0.3.5       
[29] patchwork_1.2.0             Seurat_5.1.0                SeuratObject_5.0.2          sp_2.1-4                   
[33] dreamlet_1.2.0              SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 Biobase_2.64.0             
[37] GenomicRanges_1.56.0        GenomeInfoDb_1.40.0         IRanges_2.38.0              S4Vectors_0.42.0           
[41] BiocGenerics_0.50.0         MatrixGenerics_1.16.0       matrixStats_1.3.0           variancePartition_1.34.0   
[45] BiocParallel_1.38.0         limma_3.60.0                ggplot2_3.5.1              

loaded via a namespace (and not attached):
  [1] igraph_2.0.3              graph_1.82.0              diffusionMap_1.2.0        ica_1.0-3                
  [5] plotly_4.10.4             devtools_2.4.5            zlibbioc_1.50.0           tidyselect_1.2.1         
  [9] bit_4.0.5                 doParallel_1.0.17         clue_0.3-65               lattice_0.22-6           
 [13] SQUAREM_2021.1            rjson_0.2.21              blob_1.2.4                urlchecker_1.0.1         
 [17] S4Arrays_1.4.0            pbkrtest_0.5.2            parallel_4.4.0            png_0.1-8                
 [21] cli_3.6.2                 goftest_1.2-3             BiocIO_1.14.0             mixsqp_0.3-54            
 [25] bluster_1.14.0            EnrichmentBrowser_2.34.1  BiocNeighbors_1.22.0      uwot_0.2.2               
 [29] curl_5.2.1                mime_0.12                 evaluate_0.23             leiden_0.4.3.1           
 [33] ComplexHeatmap_2.20.0     stringi_1.8.4             backports_1.5.0           lmerTest_3.1-3           
 [37] XML_3.99-0.16.1           httpuv_1.6.15             AnnotationDbi_1.66.0      rappdirs_0.3.3           
 [41] splines_4.4.0             mclust_6.1.1              sctransform_0.4.1         ggbeeswarm_0.7.2         
 [45] sessioninfo_1.2.2         DBI_1.2.2                 withr_3.0.0               corpcor_1.6.10           
 [49] xgboost_1.7.7.1           lmtest_0.9-40             GSEABase_1.66.0           TSCAN_2.0.0              
 [53] rtracklayer_1.64.0        BiocManager_1.30.23       htmlwidgets_1.6.4         fs_1.6.4                 
 [57] ggrepel_0.9.5             labeling_0.4.3            fANCOVA_0.6-1             SparseArray_1.4.3        
 [61] DESeq2_1.44.0             RcppZiggurat_0.1.6        annotate_1.82.0           truncnorm_1.0-9          
 [65] reticulate_1.37.0         zoo_1.8-12                XVector_0.44.0            knitr_1.47               
 [69] UCSC.utils_1.0.0          RhpcBLASctl_0.23-42       timechange_0.3.0          foreach_1.5.2            
 [73] fansi_1.0.6               caTools_1.18.2            grid_4.4.0                data.table_1.15.4        
 [77] R.oo_1.26.0               rmeta_3.0                 RSpectra_0.16-1           irlba_2.3.5.1            
 [81] ggrastr_1.0.2             fastDummies_1.7.3         ellipsis_0.3.2            lazyeval_0.2.2           
 [85] yaml_2.3.8                survival_3.6-4            scattermore_1.2           BiocVersion_3.19.1       
 [89] crayon_1.5.2              RcppAnnoy_0.0.22          progressr_0.14.0          later_1.3.2              
 [93] Rgraphviz_2.48.0          ggridges_0.5.6            codetools_0.2-20          GlobalOptions_0.1.2      
 [97] profvis_0.3.8             aod_1.3.3                 KEGGREST_1.44.0           Rtsne_0.17               
[101] shape_1.4.6.1             fastICA_1.2-4             estimability_1.5.1        Rsamtools_2.20.0         
[105] filelock_1.0.3            pkgconfig_2.0.3           KEGGgraph_1.64.0          TMB_1.9.11               
[109] mathjaxr_1.6-0            EnvStats_2.8.1            GenomicAlignments_1.40.0  scatterplot3d_0.3-44     
[113] spatstat.sparse_3.0-3     viridisLite_0.4.2         xtable_1.8-4              zenith_1.6.0             
[117] plyr_1.8.9                ashr_2.2-63               httr_1.4.7                rbibutils_2.2.16         
[121] tools_4.4.0               globals_0.16.3            pkgbuild_1.4.4            Rfast_2.1.0              
[125] beeswarm_0.4.0            broom_1.0.6               nlme_3.1-164              assertthat_0.2.1         
[129] lme4_1.1-35.3             digest_0.6.35             numDeriv_2016.8-1.1       farver_2.1.2             
[133] tzdb_0.4.0                remaCor_0.0.18            reshape2_1.4.4            viridis_0.6.5            
[137] glue_1.7.0                cachem_1.1.0              polyclip_1.10-6           generics_0.1.3           
[141] Biostrings_2.72.0         mvtnorm_1.2-5             parallelly_1.37.1         pkgload_1.3.4            
[145] statmod_1.5.0             RcppHNSW_0.6.0            ScaledMatrix_1.12.0       minqa_1.2.7              
[149] pbapply_1.7-2             spam_2.10-0               vroom_1.6.5               dqrng_0.4.1              
[153] utf8_1.2.4                invgamma_1.1              gtools_3.9.5              shiny_1.8.1.1            
[157] GenomeInfoDbData_1.2.12   glmmTMB_1.1.9             R.utils_2.12.3            RCurl_1.98-1.14          
[161] memoise_2.0.1             rmarkdown_2.27            mashr_0.2.79              scales_1.3.0             
[165] R.methodsS3_1.8.2         future_1.33.2             RANN_2.6.1                spatstat.data_3.0-4      
[169] rstudioapi_0.16.0         cluster_2.1.6             msigdbr_7.5.1             spatstat.utils_3.0-4     
[173] hms_1.1.3                 fitdistrplus_1.1-11       munsell_0.5.1             colorspace_2.1-0         
[177] rlang_1.1.3               DelayedMatrixStats_1.26.0 sparseMatrixStats_1.16.0  dotCall64_1.1-1          
[181] circlize_0.4.16           mgcv_1.9-1                xfun_0.44                 coda_0.19-4.1            
[185] metafor_4.6-0             remotes_2.5.0             iterators_1.0.14          emmeans_1.10.2           
[189] abind_1.4-5               bitops_1.0-7              Rdpack_2.6                promises_1.3.0           
[193] RSQLite_2.3.7             DelayedArray_0.30.1       proxy_0.4-27              compiler_4.4.0           
[197] prettyunits_1.2.0         boot_1.3-30               metadat_1.2-0             beachmat_2.20.0          
[201] listenv_0.9.1             edgeR_4.2.0               BiocSingular_1.20.0       tensor_1.5               
[205] usethis_2.2.3             MASS_7.3-60.2             progress_1.2.3            babelgene_22.9           
[209] spatstat.random_3.2-3     R6_2.5.1                  fastmap_1.2.0             vipor_0.4.7              
[213] ROCR_1.0-11               rsvd_1.0.5                gtable_0.3.5              KernSmooth_2.23-22       
[217] miniUI_0.1.1.1            deldir_2.0-4              htmltools_0.5.8.1         RcppParallel_5.1.7       
[221] bit64_4.0.5               spatstat.explore_3.2-7    lifecycle_1.0.4           blme_1.0-5               
[225] nloptr_2.0.3              restfulr_0.0.15           vctrs_0.6.5               spatstat.geom_3.2-9      
[229] scran_1.32.0              future.apply_1.11.2       pillar_1.9.0              gplots_3.1.3.1           
[233] metapod_1.12.0            locfit_1.5-9.9            combinat_0.0-8            jsonlite_1.8.8           
[237] GetoptLong_1.0.5   
GabrielHoffman commented 4 months ago

Your model is singular, which means that you have redundant variables. dreamlet can't tell which variables are the problem, so it is reporting the last variables. I think the problem is with Donor. If you have multiple samples per donor, then you need to model Donor as a random effect. If you don't have multiple samples per donor, then you shouldn't include Donor in the formula. For categorical variables, only variables where there is variation within category should be included in the formula

gjones3339 commented 4 months ago

Thank you so much for responding. I had that initially but then received a warning saying that categorical variables needed to be either all random or all fixed. We have 2 donors each with 3 different conditions. Condition is the primary variable we care about for DE, so did not want to make that a random effect. Can we make them all random for the variance partition and then switch to condition + (1|donor) for the differential expression? Or just ignore the warning?

GabrielHoffman commented 4 months ago

When running fitVarPart(), all or none categorical variables must be random effects. That is not required for processAssays() or `dreamlet().

If you get that warning for the other functions, then get the latest version where that is fixed:

devtools::install_github("DiseaseNeurogenomics/dreamlet")