Receiving 'ref.clust' not in 'clusters' for cluster label that is part of predefined clusters

xlancelottx commented 2 years ago

Hi scran team,

When calculating size factors with a predefined reference cluster, I keep getting an error that my reference cluster is not part of the clusters, even though I made sure it is part of the input_clusters.

sce <- SingleCellExperiment(list(counts=dat_mat))
sce

Relevant output: class: SingleCellExperiment dim: 24737 35680

length(input_clusters)
input_clusters

Relevant output: length: 35680 Levels: '0''1''2''3''4''5''6''7''8''9''10''11''12''13''14''15''16''17''18''ref'

size_factors = calculateSumFactors(sce, clusters=input_clusters,
                                   ref.clust='ref', max.cluster.size=5000,
                                   min.mean=0.1)

Error message: Error in .rescale_clusters(clust.profile, ref.col = ref.clust, min.mean = min.mean): 'ref.clust' not in 'clusters' Traceback:

calculateSumFactors(sce, clusters = input_clusters, ref.clust = "ref", . max.cluster.size = 5000, min.mean = 0.1)
pooledSizeFactors(...)
pooledSizeFactors(...)
.local(x, ...)
.calculate_pooled_factors(assay(x, i = assay.type), ...)
.rescale_clusters(clust.profile, ref.col = ref.clust, min.mean = min.mean)
stop("'ref.clust' not in 'clusters'")

Used versions R version 4.1.3, scuttle_1.4.0, scran_1.22.1

Also checked, that no cluster is > 5000 cells, so not "-1" tag should be added to any cluster labels. Out of interest, would it then be necessary to add the "-1" extension to the ref-cluster name?

Did you experience this before? Any idea where this could come from?

Thanks and best regards!

LTLA commented 2 years ago

No idea; never seen this before. If you can make a minimal reproducible example, I'm happy to take a look.

xlancelottx commented 2 years ago

Sure, used one of the data sets you use in this scran tutorial

# loading libraries
library(scRNAseq)
library(scuttle)
library(scran)

# loading data and minimal preprocessing
sce <- GrunPancreasData()
qcstats <- perCellQCMetrics(sce)
qcfilter <- quickPerCellQC(qcstats, percent_subsets="altexps_ERCC_percent")
sce <- sce[,!qcfilter$discard]

# clustering
clusters <- quickCluster(sce)

# checking which labels are available
levels(clusters)
# Returns: '1''2''3''4''5''6''7'

# calculating size factors 
sce <- computeSumFactors(sce, clusters=clusters, ref.clust = '2')

# Throws the error above

Additional note: Based on source code in scuttle pooledSizeFactors.R line 415 , ref.clust needs to be a character, so that it is considered. And just after trying to understand some of the code, could it be that the names of clust.profile get lost?

xlancelottx commented 2 years ago

Update:

Tried out a quick and dirty fix using assignInNamespace(x=".calculate_pooled_factors", value=fixed_function, ns="scuttle") and defined the ref cluster as the index in an earlier object, that still has names (indices). As far as I understood, that also happens when no ref.clust is defined. Seems to work by adding a statement to define the ref cluster index instead of the name and feeding that to .rescale_clusters. So changed:

if (is.null(ref.clust)) {
        non.zeroes <- vapply(clust.profile, FUN = function(x) sum(x > 
            0), FUN.VALUE = 0L)
        ref.clust <- which.max(non.zeroes)
        print(paste("No ref.clust defined, automatically chosen ref cluster index:", ref.clust))
    }

to

    if (is.null(ref.clust)) {
        non.zeroes <- vapply(clust.profile, FUN = function(x) sum(x > 
            0), FUN.VALUE = 0L)
        ref.clust <- which.max(non.zeroes)
        print(paste("No ref.clust defined, automatically chosen ref cluster index:", ref.clust))
    } else {
        ref.clust <- which(names(indices)==ref.clust)
        print(paste("ref.clust defined, using ref cluster index:", ref.clust))
        if (length(ref.clust)==0L) { 
            stop("'ref.clust' not in 'clusters', check if any cluster is larger than max.cluster.size and append ref.clust name with '-1'")
            }
    }

That one also returns a somewhat informative error if any cluster is larger than max.cluster.size, since then the '-1' tag needs to be added to the ref.clust

Haven't checked the above super carefully, so would be great, if you could double-check, that this approach is correct. But hope it helps for fixing it (probably much more elegant in a different way) :)

LTLA commented 2 years ago

Should be fixed by the latest commit in scuttle. The immediate workaround is to set ref.clust to the integer code corresponding to ref.

xlancelottx commented 2 years ago

That's great, thanks a lot!

MarioniLab / scran

Receiving 'ref.clust' not in 'clusters' for cluster label that is part of predefined clusters #99