Closed yuyingxie closed 4 years ago
By default, mmDS
retains only genes with a count >= min_count
in > min_cells
for differential testing, whereas this filtering is performed for each cluster separately. So I think my answer would be it is advisable to keep all genes at first, as different subsets of genes will be filtered out for each cluster "under the hood", but filtering genes before-hand would exclude them for all clusters.
My question is that when using the method of nbimon, does the method estimate the scaling factor for each cell? If so, we need to keep at least, say, 2000 genes so that we can have a reasonable estimate for the scaling factor.
The method uses sizeFactors(sce)
if they are present, and otherwise estimates them. So if you're worried about this, you could estimate them first on a larger set of genes using your method of choice, and then run muscat on a smaller set.
Thanks for your response. I am new to this field. How can I estimate the sizeFactor ? I googled and found package 'scran' has the function sce <- computeSumFactors(sce, clusters=clusters).
Is that what you will recommend?
I ran the following code and got the error messages
mm <- mmDS(B1, method = "nbinom")
Testing 6 genes across 2611 cells in cluster “Treg”...
[1] "~(1|sample_id)+offset(ls)+group_id" Argument 'coef' not specified; testing for “group_idHealthy”. |======================================================================| 100%
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 6, 1, 4 In addition: Warning message: In fitTMB(TMBStruc) : Model convergence problem; false convergence (8). See vignette('troubleshooting')
I use this code in the tutorial to run.
data(sce)
sce <- sce[, sce$cluster_id == "B cells"] sce$cluster_id <- droplevels(sce$cluster_id)
gs <- sample(nrow(sce), 100) sce <- sce[gs, ]
res <- mmDS(sce, method = "dream", n_threads = 2, verbose = FALSE)
and error:
Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in .Call("FreeADFunObject", ptr, PACKAGE = DLL) : "FreeADFunObject" not available for .Call() for package "glmmTMB" Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 85, 1, 4
session()
| >
This seems to be a glmmTMB issue on some platforms, see https://github.com/glmmTMB/glmmTMB/issues/615
Just want to make sure that if we try to use mehtod = nbinom, we should keep all the genes instead of filtered genes. My understanding is that when we model the counts with negative binomial, we need all the gene expression data. Am I correct?
Thanks for your help