yuyingxie commented 4 years ago

Just want to make sure that if we try to use mehtod = nbinom, we should keep all the genes instead of filtered genes. My understanding is that when we model the counts with negative binomial, we need all the gene expression data. Am I correct?

Thanks for your help

HelenaLC commented 4 years ago

By default, mmDS retains only genes with a count >= min_count in > min_cells for differential testing, whereas this filtering is performed for each cluster separately. So I think my answer would be it is advisable to keep all genes at first, as different subsets of genes will be filtered out for each cluster "under the hood", but filtering genes before-hand would exclude them for all clusters.

yuyingxie commented 4 years ago

My question is that when using the method of nbimon, does the method estimate the scaling factor for each cell? If so, we need to keep at least, say, 2000 genes so that we can have a reasonable estimate for the scaling factor.

plger commented 4 years ago

The method uses sizeFactors(sce) if they are present, and otherwise estimates them. So if you're worried about this, you could estimate them first on a larger set of genes using your method of choice, and then run muscat on a smaller set.

yuyingxie commented 4 years ago

Thanks for your response. I am new to this field. How can I estimate the sizeFactor ? I googled and found package 'scran' has the function sce <- computeSumFactors(sce, clusters=clusters).

Is that what you will recommend?

yuyingxie commented 4 years ago

I ran the following code and got the error messages

mm <- mmDS(B1, method = "nbinom")

Testing 6 genes across 2611 cells in cluster “Treg”...

[1] "~(1|sample_id)+offset(ls)+group_id" Argument 'coef' not specified; testing for “group_idHealthy”. |======================================================================| 100%

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 6, 1, 4 In addition: Warning message: In fitTMB(TMBStruc) : Model convergence problem; false convergence (8). See vignette('troubleshooting')

yuyingxie commented 4 years ago

I use this code in the tutorial to run.

data(sce)

subset "B cells" cluster

sce <- sce[, sce$cluster_id == "B cells"] sce$cluster_id <- droplevels(sce$cluster_id)

downsample to 100 genes

gs <- sample(nrow(sce), 100) sce <- sce[gs, ]

res <- mmDS(sce, method = "dream", n_threads = 2, verbose = FALSE)

and error:

Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 85, 1, 4

session()

R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: character(0) other attached packages: [1] muscat_1.3.1 loaded via a namespace (and not attached): [1] ggbeeswarm_0.6.0 minqa_1.2.4 colorspace_1.4-1 [4] rjson_0.2.20 colorRamps_2.3 ellipsis_0.3.1 [7] circlize_0.4.10 scuttle_0.99.12 XVector_0.29.3 [10] GenomicRanges_1.41.6 GlobalOptions_0.1.2 BiocNeighbors_1.7.0 [13] clue_0.3-57 rstudioapi_0.11 listenv_0.8.0 [16] glmmTMB_1.0.2.1 stats_4.0.2 bit64_4.0.2 [19] AnnotationDbi_1.51.3 codetools_0.2-16 splines_4.0.2 [22] doParallel_1.0.15 geneplotter_1.67.0 scater_1.17.4 [25] nloptr_1.2.2.2 pbkrtest_0.4-8.6 annotate_1.67.0 [28] base_4.0.2 cluster_2.1.0 png_0.1-7 [31] sctransform_0.2.1 compiler_4.0.2 Matrix_1.2-18 [34] limma_3.45.10 BiocSingular_1.5.0 prettyunits_1.1.1 [37] tools_4.0.2 lmerTest_3.1-2 rsvd_1.0.3 [40] gtable_0.3.0 glue_1.4.1 GenomeInfoDbData_1.2.3 [43] reshape2_1.4.4 dplyr_1.0.1 grDevices_4.0.2 [46] Rcpp_1.0.5 Biobase_2.49.0 vctrs_0.3.2 [49] gdata_2.18.0 nlme_3.1-148 iterators_1.0.12 [52] DelayedMatrixStats_1.11.1 stringr_1.4.0 globals_0.12.5 [55] lme4_1.1-23 lifecycle_0.2.0 irlba_2.3.3 [58] gtools_3.8.2 statmod_1.4.34 XML_3.99-0.5 [61] future_1.18.0 edgeR_3.31.4 zlibbioc_1.35.0 [64] MASS_7.3-51.6 scales_1.1.1 graphics_4.0.2 [67] hms_0.5.3 parallel_4.0.2 SummarizedExperiment_1.19.6 [70] TMB_1.7.18 RColorBrewer_1.1-2 utils_4.0.2 [73] SingleCellExperiment_1.11.6 ComplexHeatmap_2.5.5 memoise_1.1.0 [76] gridExtra_2.3 ggplot2_3.3.2 datasets_4.0.2 [79] stringi_1.4.6 RSQLite_2.2.0 genefilter_1.71.0 [82] S4Vectors_0.27.12 foreach_1.5.0 blme_1.0-4 [85] caTools_1.18.0 BiocGenerics_0.35.4 boot_1.3-25 [88] BiocParallel_1.23.2 shape_1.4.4 GenomeInfoDb_1.25.10 [91] rlang_0.4.7 pkgconfig_2.0.3 matrixStats_0.56.0 [94] bitops_1.0-6 lattice_0.20-41 purrr_0.3.4 [97] bit_4.0.4 tidyselect_1.1.0 variancePartition_1.19.15 [100] plyr_1.8.6 magrittr_1.5 DESeq2_1.29.8 [103] R6_2.4.1 gplots_3.0.4 IRanges_2.23.10 [106] generics_0.0.2 DelayedArray_0.15.7 DBI_1.1.0 [109] pillar_1.4.6 survival_3.2-3 RCurl_1.98-1.2 [112] tibble_3.0.3 future.apply_1.6.0 crayon_1.3.4 [115] KernSmooth_2.23-17 viridis_0.5.1 GetoptLong_1.0.2 [118] progress_1.2.2 locfit_1.5-9.4 grid_4.0.2 [121] data.table_1.13.0 blob_1.2.1 methods_4.0.2 [124] digest_0.6.25 xtable_1.8-4 numDeriv_2016.8-1.1 [127] stats4_4.0.2 munsell_0.5.0 beeswarm_0.2.3 [130] viridisLite_0.3.0 vipor_0.4.5

| >

plger commented 4 years ago

This seems to be a glmmTMB issue on some platforms, see https://github.com/glmmTMB/glmmTMB/issues/615

HelenaLC / muscat

running mmDS with method = "nbinom" #38

subset "B cells" cluster

downsample to 100 genes