Closed Yunuuuu closed 1 year ago
Instead of making a new function, why not just add an extra argument (e.g., restrict.to=
or something) to quickSubCluster
? This would avoid the duplication of all the other code.
Hi @LTLA, I intended to add an argument, but the returned value by quickSubCluster
now is a atomic vector or a List
object based on the argument "simplify", and I think the value returned by findSubCluster
should be a factor like clusterCells
or a SingleCellExperiment
object with the sub-clustering label added to the original object, I don't know if this is a good choice?
I don't really see the problem here. The user can just factor
the output to get a factor, if that's what they want.
It's a bit trickier to decide what to do when simplify=FALSE
, but my intuition would be to do something like:
all.sce <- list()
all.clusters <- as.character(groups)
by.group <- split(seq_along(groups), groups)
if (!is.null(restricted)) { # maybe check that 'restricted' is a subset of 'names(by.group)'
by.group <- by.group[as.character(restricted)]
}
for (i in names(by.group)) {
y <- x[,by.group[[i]]]
if (normalize) {
y <- logNormCounts(y, exprs_values=assay.type)
}
y <- prepFUN(y)
if (ncol(y) >= min.ncells) {
clusters <- clusterFUN(y)
clusters <- sprintf(format, i, clusters)
} else {
clusters <- rep(i, ncol(y))
}
y$subcluster <- clusters
all.sce[[i]] <- y
all.clusters[by.group[[i]]] <- clusters
}
Basically, the reported SEs are restricted to those in restricted
, but the full-length clustering vector is still returned.
Is there any reason why this wouldn't work for your use-case?
This is fine, I just wanted to keep the order of the original cell. but I don't know whether it is intensive necessary to keep the original order? I prefer to keeping the order since the data is too large to keep all the SCE object (time consuming) and I always save the cluster label data to analyze for another clustering round. Only in this way, I can use colLabel(my_sce) <- readRDS("the path of my saved cluster labels")
directly in a new R session.
I'm not entirely sure what you mean by "keep the order". Does my_sce
contain all the cells? A subset of cells?
I would suggest that you modify your PR so that quickSubCluster
becomes the relevant function, as suggested above. Then it may be easier for me to understand the difference between the current output format and your desired format.
Sorry for the delayed respond. Yes, my_sce
contain all the cells. After double checking the code in https://github.com/MarioniLab/scran/pull/105#issuecomment-1340783277
, I found this is what I want, the order of all.clusters
is the same with the column order in my_sce
. I'll modify the quickSubCluster
code and re-pull request. Thanks for your help @LTLA
I finally modified .quick_sub_cluster
as follows:
.quick_sub_cluster <- function(x, groups, restricted = NULL, normalize = TRUE,
prepFUN = NULL, min.ncells = 50, clusterFUN = NULL, BLUSPARAM = NNGraphParam(),
format = "%s.%s", assay.type = "counts", simplify = FALSE) {
if (normalize) {
alt.assay <- "logcounts"
} else {
alt.assay <- assay.type
}
if (is.null(prepFUN)) {
prepFUN <- function(x) {
# Putting this check here to avoid skipping
# user-specified modifications.
if (ncol(x) < 2L) {
return(x)
}
# For consistency with quickCluster().
dec <- modelGeneVar(x, assay.type = alt.assay)
top <- getTopHVGs(dec, n = 500, prop = 0.1)
denoisePCA(x, dec, subset.row = top, assay.type = alt.assay)
}
}
if (is.null(clusterFUN)) {
clusterFUN <- function(x) {
clusterRows(reducedDim(x, "PCA"), BLUSPARAM)
}
}
# just save the levels of the group factor and we can insert the new levels
# in the position of the current operated group.
subcluster.levels <- levels(groups)
all.clusters <- as.character(groups)
if (is.null(subcluster.levels)) {
subcluster.levels <- unique(all.clusters)
}
by.group <- split(seq_along(all.clusters), all.clusters)
if (!is.null(restricted)) {
if (!all(restricted %in% names(by.group))) {
stop("all groups specified in `restricted` should be in the groups")
}
by.group <- by.group[as.character(restricted)]
}
all.sce <- vector("list", length(by.group))
names(all.sce) <- names(by.group)
for (i in names(by.group)) {
y <- x[, by.group[[i]]]
if (normalize) {
y <- logNormCounts(y, exprs_values = assay.type)
}
y <- prepFUN(y)
if (ncol(y) >= min.ncells) {
clusters <- clusterFUN(y)
clusters <- sprintf(format, i, clusters)
} else {
clusters <- rep(i, ncol(y))
}
y$subcluster <- clusters
all.sce[[i]] <- y
all.clusters[by.group[[i]]] <- clusters
# insert the new cluster labels into the original position
operated.cluter.idx <- which(i == subcluster.levels)
subcluster.levels <- append(
subcluster.levels,
sort(unique(clusters)),
after = operated.cluter.idx
)
subcluster.levels <- subcluster.levels[-operated.cluter.idx]
}
all.clusters <- factor(all.clusters, subcluster.levels)
if (simplify) {
all.clusters
} else {
output <- as(all.sce, "List")
metadata(output) <- list(index = by.group, subcluster = all.clusters)
output
}
}
And I also modified the documents for the param
and return
section
#' @param restricted the groups to be sub-clustered, should be a subset of
#' \code{groups}. If \code{NULL}, will use all groups.
#' If \code{simplify=TRUE}, the factor of subcluster identities is returned
#' following the format defined in \code{format}. (Unless the number of cells in
#' the parent cluster is less than \code{min.cells} or the parent cluster is not
#' specified in \code{restricted}, in which case the parent cluster's name is
#' used directly.). the order of the factor is always matched with the column
#' order of \code{x}, so we can save the subcluster results easily if \code{x}
#' is a \code{SingleCellExperiment} by \code{x$subcluster <-
#' subcluster_results}.
Can this be used? If so, I'll commit it
Thanks for the effort. In the end, I just ended up implementing it myself to save everyone some time. You can try it out on the latest master
, or by installing 1.27.1 whenever it comes out.
For future reference, here's some review comments on your code:
group
, which is not true in general.Thanks a lot for reviewing the code, the latest commit has provide a simple way to do this, it has already been good enough.
Now,
quickSubCluster
will always sub-cluster all the groups, this commit enables the sub-clustering in a or more specified groups.