MarioniLab / scran

Clone of the Bioconductor repository for the scran package.
https://bioconductor.org/packages/devel/bioc/html/scran.html
39 stars 23 forks source link

Implement bootstrapCluster for clusters generated in Seurat #75

Open lucygarner opened 3 years ago

lucygarner commented 3 years ago

Hi,

I would like to establish the stability of the clusters that I have identified using Seurat (FindNeighbors and FindClusters). I was wondering how difficult it would be for me to reimplement the bootstrapCluster approach for this? Would the best way for this be to just write out a function that performs Seurat-like clustering on log normalised counts and then input this into bootstrapCluster? Would I include the PCA function in there as well?

Many thanks, Lucy

LTLA commented 3 years ago

If you write a function that accepts a log-normalized expression matrix (or a PC matrix, if transposed=TRUE), then you should just be able to give it to FUN and the rest will "just work". Assuming that your function doesn't have bugs, that is.

Note that scran::bootstrapCluster is moving to bluster::bootstrapStability in the upcoming release.

lucygarner commented 3 years ago

Thank you, I have written the following to extract the PC matrix from my Seurat object (named object) and run clustering. Does this seem reasonable?

data.use <- Embeddings(object = object[["pca"]])
data.use <- data.use[, 1:30]    # number of PCs to use

seurat_clustering <- function(PC_matrix) {
    neighbor.graphs <- FindNeighbors(object = PC_matrix,
                                     k.param = 20,
                                     compute.SNN = TRUE,
                                     prune.SNN = 1/15,
                                     nn.method = "rann",
                                     annoy.metric = "euclidean",
                                     nn.eps = 0,
                                     verbose = TRUE,
                                     force.recalc = FALSE)

    clustering.results <- FindClusters(object = neighbor.graphs$snn,
                                       modularity.fxn = 1,
                                       initial.membership = NULL,
                                       node.sizes = NULL,
                                       resolution = 0.5,
                                       method = "matrix",
                                       algorithm = 1,
                                       n.start = 10,
                                       n.iter = 10,
                                       random.seed = 0,
                                       group.singletons = TRUE,
                                       temp.file.location = NULL,
                                       edge.file.name = NULL,
                                       verbose = TRUE)

    return(clustering.results$res.0.5)
}

bootstrap_clusters <-  bootstrapCluster(data.use, FUN = seurat_clustering, transposed = TRUE)

How is bluster::bootstrapStability technically different to scran::bootstrapCluster and would you recommend switching over straight away?

Many thanks, Lucy

LTLA commented 3 years ago

Seems fine to me.

As for the difference: there is none, other than the name change and movement to a new package. Just moved some functions out of scran because the package was getting a bit too cluttered. The new package and function should be available when BioC 3.12 comes out tomorrow.

lucygarner commented 3 years ago

Great thanks. I tried the following, however I am getting strange results. I'm slightly confused as it seemed to be working yesterday!

bootstrap_clusters <- bluster::bootstrapStability(x = data.use, FUN = seurat_clustering, transposed = TRUE)

data.use is a matrix with cell IDs as row names and PCs and column names.

image

image

Any thoughts as to what might be happening here?

Best wishes, Lucy

LTLA commented 3 years ago

Sorry, I just noticed this. Looks like you didn't set a seed, so that's probably why you got different results. I'm not quite sure why you got NA's for some of the cluster pairs. It would seem like pairwiseRand() is at fault but I can't see how that could happen.

I should add that your matrix already looks pretty strange. It is pretty surprising to have low diagonal entries along with high off-diagonal entries. If you can give me a reproducible example, I can try to debug it.

LTLA commented 3 years ago

I think I got it:

pairwiseRand(factor(sample(3, 100, replace=TRUE), levels=1:8), sample(3, 100, replace=TRUE))
##             1          2            3   4   5   6   7   8
## 1 0.005751586 0.03516088  0.002100879 NaN NaN NaN NaN NaN
## 2          NA 0.01338677 -0.037901716 NaN NaN NaN NaN NaN
## 3          NA         NA -0.018066332 NaN NaN NaN NaN NaN
## 4          NA         NA           NA NaN NaN NaN NaN NaN
## 5          NA         NA           NA  NA NaN NaN NaN NaN
## 6          NA         NA           NA  NA  NA NaN NaN NaN
## 7          NA         NA           NA  NA  NA  NA NaN NaN
## 8          NA         NA           NA  NA  NA  NA  NA NaN

Your clustering function may be returning unused levels - see how 11 is a level but is not used in the factor - causing the final matrix to have NAs for all affected levels. Well, technically they're NaNs, I don't know if your screenshot's matrix view distinguishes between the two. I can't think of any other way of generating pure NAs - looking at the code, it shouldn't even be possible.

If this indeed the case, I'm guessing that Seurat is being a bit too smart for its own good when creating the cluster assignment. pairwiseRand() will simply respect these unused levels, even though they are rather silly in this case.