Closed DarioS closed 4 months ago
It was developed even before my time. So, Fluidigm.
It's probably not going to get any faster, because:
BuettnerESCData
and LengESCData
for human and mouse, respectively), subset them down to cell cycle genes as described in the OSCA book (e.g., filtered on GO:0007049) and use that in your favorite efficient single-cell classification algorithm, e.g., SingleR
. This is effectively what cyclone()
does anyway. I like this approach because, if nothing else, it makes people think about the validity of using old ESC data to classify their data, rather than sweeping these concerns into the black box of a prebuilt classifier/signature/whatever.Interesting. My goal is to label cells by <cell type, cell state> and then associate proportions to chemotherapy response. <cancer, cycling> <cancer, not cycling> <fibroblast mCAF, cycling> <fibroblast mCAF, not cycling> <fibroblast iCAF, cycling> <fibroblast iCAF, not cycling> I shall do single cell scoring using a different reference. My colleague uses Seurat's cc.genes.updated.2019. Fingers crossed.
FWIW if you follow the trail of references, I think you will find that Seurat's classifier is based on HeLa data, with some indirect contribution from HEK data. HeLa is a pretty wild system IIRC, barely human at all; though it is pretty popular as a model "organism" for studying cell cycle regulation and mechanisms, so maybe it is still relevant for this purpose. Guess we'll never know, I've never seen any experimental validation of the cell cycle scores.
If you just want cycling/non-cycling, explicit phase assignment is overkill. In fact, I bet this won't even give you proper "non-cycling"; most methods won't have a G0 state in their training data, and I'd be surprised if G1 and G0 were transcriptionally identical. Rather, you may prefer some form of subclustering on each cell type, possibly using only the cell cycle genes, and then manually annotating each subcluster as "cycling" or not. This is the general approach we used for mass cytometry, though the beautiful separation between cycling/non-cycling cells was due to the IdU marker.
Or fancier: i.e., do a PCA on all cycling genes within the (sub)population of interest, reconstruct the rank-1 matrix, take the column sums to obtain a "cell cycle activity", and then test for differences in the distribution along this axis between conditions. You can check out the ScoreFeatureSet
function in libscran for more details.
I notice the running time is long for typical-sized data sets. Will that be addressed in libscran? It would be good to modernise, if not.
It took almost four hours. Perhaps it was originally designed for Smart-seq data.