Feasibility of Running on Separate Samples/Cell Groups

I am working with scRNA-seq data that consists of various cell groups from a total of 9 samples. The dataset contains over 1 million cells. When I run StemFinder on the entire dataset, the calculation time seems to be very long. Based on my experience, the calculation time of CCAT and Cytotrace increases linearly with the number of cells. These tools seems possible to run these algorithms in parallel on separate samples or cell groups. However, this does not appear to be the case with StemFinder due to its algorithm characteristics. I understand that StemFinder utilizes the heterogeneity in cell cycle gene expression to estimate the potency of individual cells. It calculates the stemness of each cell based on its neighboring cells using a KNN algorithm. My questions are: 1. Is it possible to run StemFinder separately for each person or for a specific cell group? 2. If so, would it be possible to combine the results obtained from separate runs for comparison? If there are any guidelines or recommendations for dividing the dataset and comparing the results, I would greatly appreciate it if you could share them.

Thank you for your comment,

Yes, stemFinder can be run on a subset of the original dataset.
To run on a large dataset with many cells, we recommend the following:

Analyze the original dataset as usual, performing QC and filtering, log normalization.
Then, you can divide the dataset into smaller subsets using subset() in Seurat as desired. Find HVG, run PCA and KNN on these subsets separately.
Run stemFinder on each subset individually, using the subset’s own respective k value and KNN matrix
Add raw stemFinder scores (‘stemFinder_raw’) back to metadata of original (non-subsetted) dataset.
If desired, use the last line of code in the run_stemFinder() function to invert the scores.

CahanLab / stemfinder

Feasibility of Running on Separate Samples/Cell Groups #1