BradBalderson / Cytocipher

Analysis methods for analysing single cell RNA-seq data; particularly with the goal of checking if tentative clusters of cells are significantly different to one another in terms of their gene expression.
GNU General Public License v3.0
28 stars 6 forks source link

Cytocipher - detection of significantly different cell populations in scRNA-seq

title

Installation

!pip install cytocipher
import cytocipher as cc

See online documentation here.

Expected Input

An AnnData object, data, that has been processed similarly to the scanpy standard workflow to produce log-cpm normalised data with tentative cluster labels (e.g. from Leiden clustering). It's better if the Leiden resolution is high, so that there is alot of over-clustering. Cytocipher merges the non-significantly different clusters.

Code Scoring Minimal Example

Functions below run the marker gene identification, code scoring, & subsequent visualisation of the resulting cell by cluster enrichment scores.

cc.tl.get_markers(data, 'leiden')
cc.tl.code_enrich(data, 'leiden')
cc.pl.enrich_heatmap(data, 'leiden')

title

In a jupyter notebook, you can see documentation using, for example:

?cc.tl.get_markers

Cluster Merging Minimal Example

Below runs the cluster merging and visualises the heatmap of enrichment scores per cell for each of the new merged clusters.

cc.tl.merge_clusters(data, 'leiden')
cc.pl.enrich_heatmap(data, 'leiden_merged')

To visualise the scores being compared for a given pair of clusters, the following visualises the scores as violin plots of the enrichment scores & prints the p-values determined by comparing the scores:

cc.pl.sig_cluster_diagnostics(data, 'leiden', plot_pair=('3', '9'))

Input pair ('3', '9')
p=0.9132771265170103 (3 cells; 3 scores) vs (9 cells; 3 scores)
p=0.8128313109661132 (3 cells; 9 scores) vs (9 cells; 9 scores)

title

To get an sense of the upper- and lower- bounds for what is considered a significant cluster, default parameters plot the violins illustrated above for the upper- and lower- bounds of significant versus non-significant cluster pairs:

cc.pl.sig_cluster_diagnostics(data, 'leiden')

See the pancreas tutorial for more example Cytocipher functionality, including; visual bias checks, Sankey diagrams to visualise cluster merging, volcano plots, and more!

Issues

Please feel free to post an issue on the github if there is a problem, & I'll help you out ASAP.