HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
165 stars 33 forks source link

error in vst sctransform #98

Closed ktyssowski closed 2 years ago

ktyssowski commented 2 years ago

When I run the vst method from sctransform, I get the following error:


mm_vst_east <- mmDS(sce_east, method = "vst", vst = "sctransform", cov = c('reps'))

Calculating cell attributes from input UMI matrix: log_umi

Error in sample.int(length(x), size, replace, prob): NA in probability vector
Traceback:

1. mmDS(sce_east, method = "vst", vst = "sctransform", cov = c("reps"))
2. eval(vst_call)
3. eval(vst_call)
4. .vst_sctransform(x, verbose)
5. sctransform::vst(counts(x), min_cells = 0, verbosity = verbose)
6. sample(x = genes_step1, size = n_genes, prob = sampling_prob)
7. sample.int(length(x), size, replace, prob)

When I run it through the sctransform package it works fine (e.g., vst(counts(sce_east), min_cells=1) works) BUT when I set min_cells=0, I get the same error. I'm running sctransform version 0.3.3. Any ideas on what is going on?

HelenaLC commented 2 years ago

Jup, I'd say this is to be expected. If you have a look at ?vst, min_cells determines that only "genes that have been detected in at least this many cells" are used (default is 5). I couldn't reproduce the exact same error, but by just setting a single gene to all-0 (i.e., make it undetected across all cells), I already get an error:

library(sctransform)
data("pbmc", package = "sctransform")
a <- b <- pbmc

# make a random gene undetected in 'b'
b[sample(nrow(pbmc), 1), ] <- 0

# this works (all genes detected)
A <- vst(a, 
    n_genes = NULL,
    min_cells = 0, 
    verbosity = 0)

# this doesn't (one undetected gene)
B <- vst(b, 
    n_genes = NULL,
    min_cells = 0, 
    verbosity = 0)
> Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method
  for function 't': missing value where TRUE/FALSE needed

...ideally, I'd prefer vst to handle this (e.g., min_cells = 0 shouldn't even be permitted if it makes the method doomed to fail). Then again, I think this isn't directly a muscat-issue, but all-0 genes cause issues with many methods, and are often filtered out under the hood...

ktyssowski commented 2 years ago

Ahh ok—so before running muscat, I should filter out the all-0 genes? Should that be ok for all methods? On Feb 25, 2022, 3:00 AM -0500, Helena L. Crowell @.***>, wrote:

Jup, I'd say this is to be expected. If you have a look at ?vst, min_cells determines that only "genes that have been detected in at least this many cells" are used (default is 5). I couldn't reproduce the exact same error, but by just setting a single gene to all-0 (i.e., make it undetected across all cells), I already get an error: library(sctransform) data("pbmc", package = "sctransform") a <- b <- pbmc

make a random gene undetected in 'b'

b[sample(nrow(pbmc), 1), ] <- 0

A <- vst(a, n_genes = NULL, min_cells = 0, verbosity = 0)

B <- vst(b, n_genes = NULL, min_cells = 0, verbosity = 0)

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 't': missing value where TRUE/FALSE needed ...ideally, I'd prefer vst to handle this (e.g., min_cells = 0 shouldn't even be permitted if it makes the method doomed to fail). Then again, I think this isn't directly a muscat-issue, but all-0 genes cause issues with many methods, and are often filtered out under the hood... — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>