Open jma1991 opened 4 years ago
However, I've found this approach to be slower than using a for-loop with pre-allocation (e.g. similar to the code already in the splitAltExps function):
Well, yes, that's because you're looping over every element of var
rather than its unique levels.
If there is a need for these methods I can submit a pull-request?
Possibly, but this would likely go to the SummarizedExperiment repository rather than this one. Any such methods should benefit all SE subclasses, there isn't any reason that it would just be useful for SCEs.
Tagging @mtmorgan: does this functionality already exist in SE?S4Vectors::split()
kind of works but it's hard to remember that it splits by row instead of column in an SE. (Also I just noticed SCE doesn't implement extractROWS
properly: need to fix.)
bc220cab41b7112347dda5e094ebb2a9c987fb23 fixes the split()
issue, so a hypothetical splitByRow()
would be as easy as:
split(sce, rowData(sce)$variable)
Any update on this? Seurat has the SplitObject
function. But actually I'm asking because I'm writing a method to split a SpatialFeatureExperiment
object by geometry so for instance cells in different pieces of tissue can be split into different SFE objects; I want to keep the style consistent with any existing split function in SCE and SpatialExperiment
that splits by columns rather than rows.
No, it seems I clobbered my own PR (linked above) and also no one cared about it.
Perhaps consider making a PR to the SummarizedExperiment repo with something like:
# Completely untested!
setGeneric("splitByCol", function(x, f, ...) standardGeneric("splitByCol"))
setMethod("splitByCol", "SummarizedExperiment", function(x, f, ...) {
f <- as.factor(f)
by.levels <- split(seq_along(f), f)
for (i in seq_along(by.levels)) {
by.levels[[i]] <- x[, by.levels[[i]], drop=FALSE]
}
by.levels
})
[
method should handle everything already.Don't have the time/will to do it myself but it seems useful enough that a PR would warrant some consideration.
I renamed the split
function for SFE to splitByCol
and added a generic for it in the SFE package to avoid confusion when split
would split by row for SummarizedExperiment
. I may do a PR to SummarizedExperiment
later but I don't have the time before the Bioc2024 conference.
Is there scope to define a splitColData and splitRowData methods for the SingleCellExperiment class?
I am working with a rather large SingleCellExperiment object and I often find myself needing to split the object into a list of smaller objects for pre-processing based on either the column or row data.
This can obviously be done with the following:
However, I've found this approach to be slower than using a for-loop with pre-allocation (e.g. similar to the code already in the splitAltExps function):
If there is a need for these methods I can submit a pull-request? If not, it would be super helpful if you could advise what is the most robust and efficient method for splitting SCE objects. Thank you.