train <- function(..., top=100) {
everything <- list(...)
combined <- do.call(cbind, everything)
block <- rep(seq_along(everything), vapply(everything, ncol, FUN.VALUE=0L))
labels <- unlist(lapply(everything, FUN=function(x) x$label)) # assumes we have a 'labels' field.
# Toy examples with very few cells cause Irlba issues, hence the try().
clust <- NULL
try(clust <- scran::quickCluster(combined, min.mean=0.1, BSPARAM=IrlbaParam(), assay.type=1))
combined <- scran::computeSumFactors(combined, clust=clust, assay.type=1)
combined <- normalize(combined, exprs_values=1)
de <- scran::findMarkers(combined, direction="up", block=block, cluster=labels)
markers <- lapply(de, FUN=function(x) head(rownames(x), top))
cellassign::marker_list_to_mat(markers)
}
The aim would be to auto-generate a decent marker_gene_info from one or more reference data sets in .... This would enable people to use cellassign as an end-to-end classification tool, i.e., train and test without mandatory human intervention. Right now, people have to read through the vignette, put together a DE analysis, etc. Not too hard for battle-hardened bioinformaticians but still annoying to do. Of course, people can still fiddle with the marker lists if they want to, but if they don't, train() is there.
I haven't tested this (beyond checking that it runs) so YMMV.
Something like:
The aim would be to auto-generate a decent
marker_gene_info
from one or more reference data sets in...
. This would enable people to use cellassign as an end-to-end classification tool, i.e., train and test without mandatory human intervention. Right now, people have to read through the vignette, put together a DE analysis, etc. Not too hard for battle-hardened bioinformaticians but still annoying to do. Of course, people can still fiddle with the marker lists if they want to, but if they don't,train()
is there.I haven't tested this (beyond checking that it runs) so YMMV.