Closed carlosf79 closed 4 years ago
Could you tell me what class(logcounts(sce))
gives you?
Dear Helena, thanks for the reply, this is the output:
> class(logcounts(sce)) [1] "dgCMatrix" attr(,"package") [1] "Matrix"
Okay, I just tried running through the vignette to see what might be causing this-
however, I cannot find the above command anywhere here http://bioconductor.org/packages/release/bioc/vignettes/muscat/inst/doc/analysis.html
Where exactly is this from?
Dear Helena, I was following this (not the package vignette, sorry for the mixup):
Okay, I'm assuming you're following this script with your own data? The only thing I can image is that one/many of fs
and/or cs_by_k
are empty- i.e., Matrix::rowMeans(logcounts(sce)[gs, i, drop = FALSE])
is trying to subset 0 rows/columns.
One way to test this & prevent against it would be to adjust the code as follows:
ms_by_k <- lapply(fs, function(gs) {
if (length(gs) == 0) return(NULL)
vapply(cs_by_k, function(cs) {
if (length(cs) == 0) return(NULL)
Matrix::rowMeans(logcounts(sce)[gs, cs, drop = FALSE])
}, numeric(length(gs))
})
Otherwise, I'm note sure what could be the issue.
Dear Helena,
I have tried your code, however that did not help.
I have tried a few things, and actually excluding the lapply part made the calculation to work:
ms_by_cluster <- vapply(cs_by_k, function(i){Matrix::rowMeans(logcounts(sce)[gs, i, drop = FALSE])}, numeric(length(gs)))
This will need to be converted to a matrix before proceeding
mat <- as.matrix(ms_by_cluster)
Doing the vapply
already calculated values for all genes of interest for each cluster, and with the resulting matrix I could reproduce the heatmap for my data.
One note that may be of interest: in the previous step of the workflow, there is a step to select genes and labels, i.e.:
fs <- lapply(fs, sapply, function(g) grep(paste0(g, "$"), rownames(sce), value = TRUE)) gs <- gsub(".*\\.", "", unlist(fs)) ns <- vapply(fs, length, numeric(1)) ks <- rep.int(names(fs), ns) labs <- sprintf("%s(%s)", gs, ks)
However, something to notice is that the way gs is created could lead to problems in case a gene of interest has genes with similar names. e.g. one of my genes is PLP1, and in the gs object I retrieve also APLP1 and PLP12. This makes the sprintf command to fail. I solved this by manually checking the gs object.
Thanks for the great package, I am going through all the steps of both workflow and vignette, and it is truly a very helpful and insightful analysis.
Great that you could figure it out. Yes, the code was written for this specific analysis, so it is quite possible it will not work as is with any given dataset. That's what packages are for :)
If you are interested, you can try using muscat
's internal helper to compute aggregate data by one or two factors via, for example, muscat:::.agg(sce, assay = "logcounts", by = c("cluster_id", "sample_id", fun = "mean"))
- this is what is also called internally to compute pseudobulk sum of counts in aggregateData
.
Great, thanks a lot Helena!
Hello,
when running the "compute cluster-marker means" from the vignette with this command:
ms_by_cluster <- lapply(fs, function(gs) vapply(cs_by_k, function(i) Matrix::rowMeans(logcounts(sce)[gs, i, drop = FALSE]), numeric(length(gs))))
I get the following error:
Error in logcounts(sce)[gs, i, drop = FALSE] : invalid or not-yet-implemented 'Matrix' subsetting
The sce object is a Seurat object converted to see using commands in the vignette, and my Matrix package is the latest version (1.2-18).
I could not find away around this, do you have any suggestion?
Thanks lot for your help, kind regards, Carlo