Closed cnk113 closed 4 years ago
muscat should transfer this data from the SCE to the PB, but it can do so only if all the cells of a sample_id
(or more precisely, the 2nd by
argument of aggregateData) have the same group_id
. Perhaps a first thing to check is if there's any mismatch between those variables, with something like table(sce$group_id, sce$sample_id)
.
Hello, I am running in parallel the tutorial with the example dataset and my dataset. With the example dataset Pseudobulk method works. With my dataset I have the same issue (I think) posted here. To keep it simple/fast I am using just one subpopulation (named cluster 4) two conditions (Healthy vs Naive Disease)
res <- pbDS(pb)
res
4..
Cluster(s) "4" skipped due to an insufficient number of cells in at least 2 samples per group.
$table
$Naive =
$data
$method
'edgeR'
$design
| Healthy | Naive
-- | -- | --
HC01 | 1 | -0.7071068
HC02 | 1 | -0.7071068
HC03 | 1 | -0.7071068
HC04 | 1 | -0.7071068
SA01 | 1 | 0.7071068
SA02 | 1 | 0.7071068
SA03 | 1 | 0.7071068
SA04 | 1 | 0.7071068
$contrast
Naive
-- | --
Healthy | 0
Naive |1
$coef
NULL
And this is:
table(sce$group_id, sce$sample_id)
HC01 HC02 HC03 HC04 SA01 SA02 SA03 SA04
Healthy 65 189 99 44 0 0 0 0
Naive 0 0 0 0 47 124 197 223
If I understood properly the issue is that I don't have the same sample in Healthy and Naive disease, right? But I cannot have that. Is there another way?
Thanks
P.S.
In the tutorial library(limma)
is missing before
contrast <- makeContrasts("stim-ctrl", levels = mm)
Hi,
No having the same samples in both groups is certainly not required...
Did you subset or reorder the pb
object?
Could you print metadata(pb)
?
I have a Seurat object subpopulation, I converted it in SingleCellExperiment https://satijalab.org/seurat/v3.0/conversion_vignette.html and then I started the tutorial.
Did you subset or reorder the pb object?
No
Could you print metadata(pb) ?
$experiment_info
sample_id group_id n_cells
1 HC01 Healthy 65
2 HC02 Healthy 189
3 HC03 Healthy 99
4 HC04 Healthy 44
5 SA01 Naive 47
6 SA02 Naive 124
7 SA03 Naive 197
8 SA04 Naive 223
$agg_pars
$agg_pars$assay
[1] "counts"
$agg_pars$by
[1] "cluster_id" "sample_id"
$agg_pars$fun
[1] "sum"
$agg_pars$scale
[1] FALSE
$n_cells
sample_id
cluster_id HC01 HC02 HC03 HC04 SA01 SA02 SA03 SA04
4 65 189 99 44 47 124 197 223
It worked with:
res <- pbDS(pb,method = "DESeq2")
but I am receiving
Cluster(s) "4" skipped due to an insufficient number of cells in at least 2 samples per group.
with any other methods.
The error follows from the following lines
n_cells <- metadata(pb)$n_cells
lapply(kids, function (k) {
rmv <- n_cells[k, ] < min_cells
y <- assays(pb)[[k]][, !rmv]
d <- design[colnames(y), ]
if (!is.null(d) & any(colSums(d) < 2))
return(NULL)
...
})
skipped <- vapply(res, is.null, logical(1))
if (any(skipped) & verbose)
message("Cluster(s) ", paste(dQuote(kids[skipped]), collapse = ", "),
" skipped due to an insufficient number of cells",
" in at least 2 samples per group.")
My guess is there could be a bug in n_cells[k, ]
is the cluster IDs are numeric rather than characters. Not 100% sure though, this is just a thought.
metadata(pb)$n_cells
it looks like there only a single cluster, i.e. cluster 4. Is that correct? sce$cluster_id <- paste0("cluster", sce$cluster_id)
and tell me if that works? If so, I will fix this on our end!Thanks for your reply.
From the output of metadata(pb)$n_cells it looks like there only a single cluster, i.e. cluster 4. Is that correct?
Yes, I am keeping it simple, so I used only one subpopulation subset (cluster 4).
Could you try something like sce$cluster_id <- paste0("cluster", sce$cluster_id) and tell me if that works? If so, I will fix this on our end!
Tried, does not work:
>metadata(pb)
$experiment_info
sample_id group_id n_cells
1 HC01 Healthy 65
2 HC02 Healthy 189
3 HC03 Healthy 99
4 HC04 Healthy 44
5 SA01 Naive 47
6 SA02 Naive 124
7 SA03 Naive 197
8 SA04 Naive 223
$agg_pars
$agg_pars$assay
[1] "counts"
$agg_pars$by
[1] "cluster_id" "sample_id"
$agg_pars$fun
[1] "sum"
$agg_pars$scale
[1] FALSE
$n_cells
sample_id
cluster_id HC01 HC02 HC03 HC04 SA01 SA02 SA03 SA04
cluster4 65 189 99 44 47 124 197 223
>res <- pbDS(pb)
cluster4..
Cluster(s) "cluster4" skipped due to an insufficient number of cells in at least 2 samples per group.
And works with res <- pbDS(pb, method = "DESeq2")
Hm. I'm a bit clueless at the moment.
devtools::install_github("HelenaLC/muscat")
? I have just pushed substantial updates to GitHub. They will be available through Bioconductor with the next build, but that could take a few day.saveRDS(sce[sce$cluster_id == "cluster4", sample(seq_len(nrow), 1e3)], "debug_dataset.rds")
and send it to me)Hi, sorry for the late reply, I was busy with other stuff. Now it's working!! I tried to start from the beginning creating a SingleCellExperiemnt object instead to convert it, but I ended up with the same error. I was making the rds file to send it to you, I wanted just replace "Naive" with "Disease" because was more clear, and I realized it was working. So I tried to figure out what was going on. If I do this:
> typeof(sce$group)
'integer'
> sce$group <- as.character(sce$group)
> typeof(sce$group)
'character'
Works! If you go to the first post I wrote, there was something weird with the design matrix
this was before: model.matrix(~ 1 + ei$group_id)
(Intercept) | ei$group_id.L | |
---|---|---|
-- | 1 | -0.7071068 |
-- | 1 | -0.7071068 |
-- | 1 | -0.7071068 |
-- | 1 | -0.7071068 |
-- | 1 | 0.7071068 |
-- | 1 | 0.7071068 |
-- | 1 | 0.7071068 |
-- | 1 | 0.7071068 |
and this after: | (Intercept) | ei$group_idNaive |
---|---|---|
-- | 1 | 0 |
-- | 1 | 0 |
-- | 1 | 0 |
-- | 1 | 0 |
-- | 1 | 1 |
-- | 1 | 1 |
-- | 1 | 1 |
-- | 1 | 1 |
Not sure why exactly wasn't working.
P.S:
I tried to install the new version with the command you wrote, but it tells me:
ERROR: this R is version 3.6.3, package 'muscat' requires R >= 4.1
Thanks! Interesting, so it seems this might be model.matrix
reacting differently to integers vs. characters/factors rather than a muscat
issue per se. I'll see to making sure this doesn't happen inside pbDS
.
Re 'muscat' requires R >= 4.1
:
This is fixed now and the package should be installable.
Hello,
I've been trying to run pbDS, but I keep getting NULL error since my pb data doesn't have the variable. For some reason aggregateData doesn't set any group_id... I've tried to manually set the group_id from my sce object I get Error in if (levels[1] == "(Intercept)") { : argument is of length zero. Any ideas on how to move forward?
Thanks