Compute cluster markers gives Matrix subsetting error

carlosf79 commented 4 years ago

Hello,

when running the "compute cluster-marker means" from the vignette with this command: ms_by_cluster <- lapply(fs, function(gs) vapply(cs_by_k, function(i) Matrix::rowMeans(logcounts(sce)[gs, i, drop = FALSE]), numeric(length(gs))))

I get the following error: Error in logcounts(sce)[gs, i, drop = FALSE] : invalid or not-yet-implemented 'Matrix' subsetting

The sce object is a Seurat object converted to see using commands in the vignette, and my Matrix package is the latest version (1.2-18).

I could not find away around this, do you have any suggestion?

Thanks lot for your help, kind regards, Carlo

HelenaLC commented 4 years ago

Could you tell me what class(logcounts(sce)) gives you?

carlosf79 commented 4 years ago

Dear Helena, thanks for the reply, this is the output:

> class(logcounts(sce)) [1] "dgCMatrix" attr(,"package") [1] "Matrix"

HelenaLC commented 4 years ago

Okay, I just tried running through the vignette to see what might be causing this-
however, I cannot find the above command anywhere here http://bioconductor.org/packages/release/bioc/vignettes/muscat/inst/doc/analysis.html Where exactly is this from?

carlosf79 commented 4 years ago

Dear Helena, I was following this (not the package vignette, sorry for the mixup):

https://htmlpreview.github.io/?https://raw.githubusercontent.com/HelenaLC/muscat-comparison/master/LPS/docs/2-annotation.html#heatmap_of_mean_marker-exprs_by_cluster

HelenaLC commented 4 years ago

Okay, I'm assuming you're following this script with your own data? The only thing I can image is that one/many of fs and/or cs_by_k are empty- i.e., Matrix::rowMeans(logcounts(sce)[gs, i, drop = FALSE]) is trying to subset 0 rows/columns. One way to test this & prevent against it would be to adjust the code as follows:

ms_by_k <- lapply(fs, function(gs) {
    if (length(gs) == 0) return(NULL)
    vapply(cs_by_k, function(cs) {
        if (length(cs) == 0) return(NULL)
        Matrix::rowMeans(logcounts(sce)[gs, cs, drop = FALSE])
    }, numeric(length(gs))
})

Otherwise, I'm note sure what could be the issue.

carlosf79 commented 4 years ago

Dear Helena,

I have tried your code, however that did not help. I have tried a few things, and actually excluding the lapply part made the calculation to work: ms_by_cluster <- vapply(cs_by_k, function(i){Matrix::rowMeans(logcounts(sce)[gs, i, drop = FALSE])}, numeric(length(gs))) This will need to be converted to a matrix before proceeding mat <- as.matrix(ms_by_cluster)

Doing the vapply already calculated values for all genes of interest for each cluster, and with the resulting matrix I could reproduce the heatmap for my data.

One note that may be of interest: in the previous step of the workflow, there is a step to select genes and labels, i.e.: fs <- lapply(fs, sapply, function(g) grep(paste0(g, "$"), rownames(sce), value = TRUE)) gs <- gsub(".*\\.", "", unlist(fs)) ns <- vapply(fs, length, numeric(1)) ks <- rep.int(names(fs), ns) labs <- sprintf("%s(%s)", gs, ks) However, something to notice is that the way gs is created could lead to problems in case a gene of interest has genes with similar names. e.g. one of my genes is PLP1, and in the gs object I retrieve also APLP1 and PLP12. This makes the sprintf command to fail. I solved this by manually checking the gs object.

Thanks for the great package, I am going through all the steps of both workflow and vignette, and it is truly a very helpful and insightful analysis.

HelenaLC commented 4 years ago

Great that you could figure it out. Yes, the code was written for this specific analysis, so it is quite possible it will not work as is with any given dataset. That's what packages are for :) If you are interested, you can try using muscat's internal helper to compute aggregate data by one or two factors via, for example, muscat:::.agg(sce, assay = "logcounts", by = c("cluster_id", "sample_id", fun = "mean"))- this is what is also called internally to compute pseudobulk sum of counts in aggregateData.

carlosf79 commented 4 years ago

Great, thanks a lot Helena!

HelenaLC / muscat

Compute cluster markers gives Matrix subsetting error #31