AlexsLemonade / alsf-scpca

Management and analysis tools for ALSF Single-cell Pediatric Cancer Atlas data.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Add per gene level benchmarking notebook #88

Closed allyhawkins closed 3 years ago

allyhawkins commented 3 years ago

Related to #82, I have separated out the benchmarking analysis looking at metrics at a per gene level into a separate notebook. This is looking at the same samples that were imported into a SingleCellExperiment in #82 and now using the scater::addPerFeatureQC method to look at mean gene expression and percentage of cells each gene is expressed in across the pre-processing tools.

We are using 4 single cell RNA seq samples and 2 single nuclei RNA seq samples, run on 4 different tools (cellranger, alevin, alevin-fry, and kallisto) with single nuclei RNA seq samples being run both on a cDNA and pre-mRNA index. In this notebook I chose to look at the correlation of mean gene expression and percent of cells a gene is detected in across each tool with cellranger for shared cells. I also looked at the percentage of genes that are overlapping between each tool and cellranger and the mean gene expression of those that overlap vs. those that do not overlap.

I'm sure there are more things we can do, like looking at the overlap specifically of the most variable genes or highly expressed genes, but I wanted to start here because it does appear that most tools have fairly good overlap and high spearman correlation coefficients of > 0.9.