stemFinder vignette

Kathleen Noller 07/11/2024


Single-cell estimation of differentiation time from scRNA-seq data


Load query data - Bone marrow from Tabula Muris

Query data should be a Seurat object containing a log-normalized, scaled single-cell gene expression matrix

Query data must have two metadata columns:

Phenotype (character vector of cell type annotations) and Ground_truth (numeric vector of ascending ground truth values denoting extent of differentiation)
Note: example data has already been filtered, normalized, and scaled

Download query data: Tabula Muris bone marrow, 10X platform

adata = readRDS("MurineBoneMarrow10X_GSE109774.rds")
Prepare inputs to stemFinder

# Select input cell cycle gene list
      ## standard input to stemFinder: G2M and S cell cycle genes
      ## G2M and S gene lists are provided for mouse, human, and C. elegans
cell_cycle_genes = c(s_genes_mouse, g2m_genes_mouse)[c(s_genes_mouse, g2m_genes_mouse) %in% rownames(adata)] 
VariableFeatures(adata) = VariableFeatures(adata)[!(VariableFeatures(adata) %in% cell_cycle_genes)] #make sure cell cycle genes are not among highly variable features

adata <- RunPCA(adata, verbose = F)
p1 <- ElbowPlot(adata, ndims = 50)
#Select PCs based on elbow plot
pcs = 32

#Perform K nearest neighbors
k = round(sqrt(ncol(adata))) #default value of k parameter
adata = FindNeighbors(adata, dims = 1:pcs, k.param = k, verbose = F)
knn = adata@graphs$RNA_nn #KNN matrix

Run stemFinder


adata: Seurat object containing log-normalized, scaled gene expression data (features x cells)
k: number of nearest neighbors
nn: KNN matrix (cells x cells)
thresh: threshold for binarizing gene expression data (default = 0)
markers: character vector of cell cycle genes present in query data
method: string denoting which method of computing gene expression heterogeneity to use (default: ‘gini’, other: ‘stdev’ and ‘variance’)
adata = run_stemFinder(adata, k = k, nn = knn, thresh = 0, markers = cell_cycle_genes, method = 'gini')

The following 2 columns are added to metadata:

-Raw stemFinder score (“stemFinder_raw”)
-stemFinder score with directionality corresponding to pseudotime / ground truth (“stemFinder”)

Check against previously-computed stemFinder results on this dataset

sF_scores = read.csv("bmmc_sF_results.csv", row.names = 1)
Quantify stemFinder performance relative to ground truth

# Compute stemFinder performance metrics
list_all = compute_performance_single(adata, competitor = F)
Optional: compare stemFinder performance to another method

CytoTRACE and CCAT scores for BMMC query data

#Load pre-computed competitor scores 
comp_scores = read.csv("bmmc_competitor_results.csv", row.names = 1)
adata$competitor = adata$ccat_invert #rename desired competitor column 

#Quantify performance
list_all_withcomp = compute_performance_single(adata, competitor = T, comp.inverted = T)
Visualize stemFinder and competitor results

Feature plot
p2 <- FeaturePlot(adata, features = c('Ground_truth','stemFinder','competitor'), cols = c('blue','red'), ncol = 3)
Box plot
p3 <- ggplot(, aes(x = Ground_truth, y = stemFinder)) + geom_point() + geom_boxplot(aes(group = Ground_truth, color = Ground_truth)) + theme_bw() + ggtitle("stemFinder score vs. Ground truth") + ylab("stemFinder score") + xlab("Ground truth")