HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
166 stars 33 forks source link

pbDS group_id NULL #25

Closed cnk113 closed 4 years ago

cnk113 commented 4 years ago

Hello,

I've been trying to run pbDS, but I keep getting NULL error since my pb data doesn't have the variable. For some reason aggregateData doesn't set any group_id... I've tried to manually set the group_id from my sce object I get Error in if (levels[1] == "(Intercept)") { : argument is of length zero. Any ideas on how to move forward?

Thanks

plger commented 4 years ago

muscat should transfer this data from the SCE to the PB, but it can do so only if all the cells of a sample_id (or more precisely, the 2nd by argument of aggregateData) have the same group_id. Perhaps a first thing to check is if there's any mismatch between those variables, with something like table(sce$group_id, sce$sample_id).

domi84 commented 4 years ago

Hello, I am running in parallel the tutorial with the example dataset and my dataset. With the example dataset Pseudobulk method works. With my dataset I have the same issue (I think) posted here. To keep it simple/fast I am using just one subpopulation (named cluster 4) two conditions (Healthy vs Naive Disease)

res <- pbDS(pb)
res
4..

Cluster(s) "4" skipped due to an insufficient number of cells in at least 2 samples per group.

$table
    $Naive =

$data

$method
    'edgeR'
$design

  | Healthy | Naive
-- | -- | --
HC01 | 1 | -0.7071068
HC02 | 1 | -0.7071068
HC03 | 1 | -0.7071068
HC04 | 1 | -0.7071068
SA01 | 1 | 0.7071068
SA02 | 1 | 0.7071068
SA03 | 1 | 0.7071068
SA04 | 1 | 0.7071068
$contrast

  Naive
-- | --
Healthy | 0
Naive |1

$coef
    NULL

And this is:

table(sce$group_id, sce$sample_id)
          HC01 HC02 HC03 HC04 SA01 SA02 SA03 SA04
  Healthy     65    189     99     44     0     0     0     0
  Naive        0      0      0      0    47   124   197   223

If I understood properly the issue is that I don't have the same sample in Healthy and Naive disease, right? But I cannot have that. Is there another way?

Thanks

P.S. In the tutorial library(limma) is missing before contrast <- makeContrasts("stim-ctrl", levels = mm)

plger commented 4 years ago

Hi, No having the same samples in both groups is certainly not required... Did you subset or reorder the pb object? Could you print metadata(pb) ?

domi84 commented 4 years ago

I have a Seurat object subpopulation, I converted it in SingleCellExperiment https://satijalab.org/seurat/v3.0/conversion_vignette.html and then I started the tutorial.

Did you subset or reorder the pb object?

No

Could you print metadata(pb) ?

$experiment_info
  sample_id group_id n_cells
1    HC01  Healthy      65
2    HC02  Healthy     189
3    HC03  Healthy      99
4    HC04  Healthy      44
5     SA01    Naive      47
6     SA02    Naive     124
7     SA03    Naive     197
8     SA04    Naive     223

$agg_pars
$agg_pars$assay
[1] "counts"

$agg_pars$by
[1] "cluster_id" "sample_id" 

$agg_pars$fun
[1] "sum"

$agg_pars$scale
[1] FALSE

$n_cells
          sample_id
cluster_id HC01 HC02 HC03 HC04 SA01 SA02 SA03 SA04
         4     65    189     99     44    47   124   197   223
domi84 commented 4 years ago

It worked with: res <- pbDS(pb,method = "DESeq2")

but I am receiving Cluster(s) "4" skipped due to an insufficient number of cells in at least 2 samples per group. with any other methods.

HelenaLC commented 4 years ago

The error follows from the following lines

n_cells <- metadata(pb)$n_cells
lapply(kids, function (k) {
    rmv <- n_cells[k, ] < min_cells
    y <- assays(pb)[[k]][, !rmv]
    d <- design[colnames(y), ]
    if (!is.null(d) & any(colSums(d) < 2)) 
        return(NULL)
    ...
})
skipped <- vapply(res, is.null, logical(1))
if (any(skipped) & verbose)
    message("Cluster(s) ", paste(dQuote(kids[skipped]), collapse = ", "), 
        " skipped due to an insufficient number of cells", 
        " in at least 2 samples per group.")

My guess is there could be a bug in n_cells[k, ] is the cluster IDs are numeric rather than characters. Not 100% sure though, this is just a thought.

  1. From the output of metadata(pb)$n_cells it looks like there only a single cluster, i.e. cluster 4. Is that correct?
  2. Could you try something like sce$cluster_id <- paste0("cluster", sce$cluster_id) and tell me if that works? If so, I will fix this on our end!
domi84 commented 4 years ago

Thanks for your reply.

From the output of metadata(pb)$n_cells it looks like there only a single cluster, i.e. cluster 4. Is that correct?

Yes, I am keeping it simple, so I used only one subpopulation subset (cluster 4).

Could you try something like sce$cluster_id <- paste0("cluster", sce$cluster_id) and tell me if that works? If so, I will fix this on our end!

Tried, does not work:

>metadata(pb)

$experiment_info
  sample_id group_id n_cells
1    HC01  Healthy      65
2    HC02  Healthy     189
3    HC03  Healthy      99
4    HC04  Healthy      44
5     SA01    Naive      47
6     SA02    Naive     124
7     SA03    Naive     197
8     SA04    Naive     223

$agg_pars
$agg_pars$assay
[1] "counts"

$agg_pars$by
[1] "cluster_id" "sample_id" 

$agg_pars$fun
[1] "sum"

$agg_pars$scale
[1] FALSE

$n_cells
          sample_id
cluster_id HC01 HC02 HC03 HC04 SA01 SA02 SA03 SA04
  cluster4     65    189     99     44    47   124   197   223

>res <- pbDS(pb)

cluster4..
Cluster(s) "cluster4" skipped due to an insufficient number of cells in at least 2 samples per group.

And works with res <- pbDS(pb, method = "DESeq2")

HelenaLC commented 4 years ago

Hm. I'm a bit clueless at the moment.

  1. Could you try updating the package via devtools::install_github("HelenaLC/muscat")? I have just pushed substantial updates to GitHub. They will be available through Bioconductor with the next build, but that could take a few day.
  2. If all else fails, I'd be happy to look into it if you're willing to share an .rds object containing a subset of the data via email (e.g. something like saveRDS(sce[sce$cluster_id == "cluster4", sample(seq_len(nrow), 1e3)], "debug_dataset.rds") and send it to me)
domi84 commented 4 years ago

Hi, sorry for the late reply, I was busy with other stuff. Now it's working!! I tried to start from the beginning creating a SingleCellExperiemnt object instead to convert it, but I ended up with the same error. I was making the rds file to send it to you, I wanted just replace "Naive" with "Disease" because was more clear, and I realized it was working. So I tried to figure out what was going on. If I do this:

> typeof(sce$group)
'integer'
> sce$group <- as.character(sce$group)
> typeof(sce$group)
'character'

Works! If you go to the first post I wrote, there was something weird with the design matrix

this was before: model.matrix(~ 1 + ei$group_id)

  (Intercept) ei$group_id.L
-- 1 -0.7071068
-- 1 -0.7071068
-- 1 -0.7071068
-- 1 -0.7071068
-- 1 0.7071068
-- 1 0.7071068
-- 1 0.7071068
-- 1 0.7071068
and this after:   (Intercept) ei$group_idNaive
-- 1 0
-- 1 0
-- 1 0
-- 1 0
-- 1 1
-- 1 1
-- 1 1
-- 1 1

Not sure why exactly wasn't working.

P.S: I tried to install the new version with the command you wrote, but it tells me: ERROR: this R is version 3.6.3, package 'muscat' requires R >= 4.1

HelenaLC commented 4 years ago

Thanks! Interesting, so it seems this might be model.matrix reacting differently to integers vs. characters/factors rather than a muscat issue per se. I'll see to making sure this doesn't happen inside pbDS.

Re 'muscat' requires R >= 4.1: This is fixed now and the package should be installable.