Irrationone / cellassign

Automated, probabilistic assignment of cell types in scRNA-seq data
Other
191 stars 79 forks source link

Consider a training function to generate "marker_gene_info" #32

Open LTLA opened 5 years ago

LTLA commented 5 years ago

Something like:

train <- function(..., top=100) {
    everything <- list(...)
    combined <- do.call(cbind, everything)
    block <- rep(seq_along(everything), vapply(everything, ncol, FUN.VALUE=0L))
    labels <- unlist(lapply(everything, FUN=function(x) x$label)) # assumes we have a 'labels' field.

    # Toy examples with very few cells cause Irlba issues, hence the try().
    clust <- NULL
    try(clust <- scran::quickCluster(combined, min.mean=0.1, BSPARAM=IrlbaParam(), assay.type=1))
    combined <- scran::computeSumFactors(combined, clust=clust, assay.type=1)
    combined <- normalize(combined, exprs_values=1)

    de <- scran::findMarkers(combined, direction="up", block=block, cluster=labels)
    markers <- lapply(de, FUN=function(x) head(rownames(x), top))
    cellassign::marker_list_to_mat(markers)
}

The aim would be to auto-generate a decent marker_gene_info from one or more reference data sets in .... This would enable people to use cellassign as an end-to-end classification tool, i.e., train and test without mandatory human intervention. Right now, people have to read through the vignette, put together a DE analysis, etc. Not too hard for battle-hardened bioinformaticians but still annoying to do. Of course, people can still fiddle with the marker lists if they want to, but if they don't, train() is there.

I haven't tested this (beyond checking that it runs) so YMMV.