GabrielHoffman / dreamlet

Perform differential expression analysis on multi-sample single cell datasets using linear mixed models
https://gabrielhoffman.github.io/dreamlet
20 stars 4 forks source link

run_mash fails when multiple coefficients provided to coefList #18

Closed kennethabarr closed 4 months ago

kennethabarr commented 5 months ago

run_mash appears to be set up to run on multiple coefficients, but throws an error when multiple are provided. Specifically the line

if (!coefList %in% coefNames(fit)) {
    stop("coef not found in coefNames(fit): ", coefList)
 }

Perhaps this should read

for (coef in coefList) {
    if (!coef %in% coefNames(fit)) {
      stop("coef not found in coefNames(fit): ", coef)
    }
}
GabrielHoffman commented 5 months ago

Can you provide a minimal reproducible example?

kennethabarr commented 5 months ago

It will take some time to put together a minimal reproducible example with a valid single cell object, but the lines if (! coefList %in% coefNames(fit)) and lvls <- c(vapply(assayNames(fit), function(x) paste(x, coefList, sep = "."), character(0))) will fail if coefList has length greater than zero.

As a very minimal example lets assume an output of assayNames(fit) and coefNames(fit) and substitute into the problematic lines that give me errors. assaynames <- c("assayA","assayB") coefnames <- c("coefA","coefB") then run_mash(fit, c("coefA","coefB")) would throw errors at the following lines

> if (! c("coefA","coefB") %in% coefnames) stop("error")
Error in if (!c("coefA", "coefB") %in% coefnames) stop("error") : 
  the condition has length > 1

and

> lvls <- c(vapply(assaynames, function(x) paste(x, c("coefA","coefB"), sep = "."), character(0)))
Error in vapply(assaynames, function(x) paste(x, c("coefA",  : 
  values must be length 0,
 but FUN(X[[1]]) result is length 2

I can get back to you with a better example if the problem isn't obvious here.

GabrielHoffman commented 5 months ago

Fixed in v1.1.21.

See


library(muscat)
library(mashr)
library(dreamlet)
library(SingleCellExperiment)

data(example_sce)

# create pseudobulk for each sample and cell cluster
pb <- aggregateToPseudoBulk(example_sce[1:100, ],
  assay = "counts",
  cluster_id = "cluster_id",
  sample_id = "sample_id",
  verbose = FALSE)

# voom-style normalization
res.proc <- processAssays(pb, ~group_id)

# Differential expression analysis within each assay
res.dl <- dreamlet(res.proc, ~group_id)

# run MASH model
# Just as a technical example, 
# doesn't make biological sense to test this
res_mash <- run_mash(res.dl, coefNames(res.dl))