becavin-lab / checkatlas

One liner tool to check the quality of your single-cell atlases.
https://checkatlas.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Add knee plots #21

Closed grst closed 1 year ago

grst commented 2 years ago

Hi @drbecavin,

this package looks great! We are currently considering to add it to the nf-core scrnaseq workflow (see https://github.com/nf-core/scrnaseq/issues/80).

One feature I'd love to see are knee plots for QC metrics. I find them superior over the current violin plots for finding inflection points and they would also be easy to render for many samples simultaneously.

In particular, I think the following plots would be useful:

Here's an example from the cellranger report

image

Here's another example from some custom python script I usually use for single cell QC with scanpy:

image (y-axis= cell rank, n_genes_by_counts = number of detected genes, red lines indicate cutoffs I chose)

The knee plots could be (as opposed to the violin plots) easily combined into a single, interactive multiQC figure. This helps identifying outliers with bad quality when working with many single-cell samples. Here's an example of such a plot from the nf-core/rnaseq multiqc report:

image

drbecavin commented 2 years ago

Thank you for your interest in checkatlas !!! First of all, I was planning to add the QC in tables so MultiQC can plot all on the same plot. I was planning to do that first thing in the morning on monday. You are right the violin plot superposed will not readable, so I will work directly with knee-plot. I'll let you know !

For the variables: cell rank vs. total counts cell rank vs. detected genes cell rank vs. mitochondrial fraction How can we get cell rank from the cellranger h5 file ? Check atlas only process these files, along with scanpy and seurat files.

If you want a complete integration of cellranger output into MultiQC. You should look at this PR: https://github.com/ewels/MultiQC/pull/1689 Cellranger will be integretad in the new version of MultiQC

grst commented 2 years ago

How can we get cell rank from the cellranger h5 file

I mean the rank is essentially just the numeric index of the cell when sorted by counts/genes/fraction. Here's a solution with pandas: https://github.com/icbi-lab/luca/blob/6f2ea7203272b6d6f7e415f7abf0d247c8368ed4/modules/local/scqc/qc_plots.py#L76-L77

If you want a complete integration of cellranger output into MultiQC. You should look at this PR: https://github.com/ewels/MultiQC/pull/1689 Cellranger will be integretad in the new version of MultiQC

The nf-core scrnaseq pipeline supports four different aligners: alevin, kallisto, starsolo and cellranger. We have now implemented that independent of the aligner a single h5ad object is generated. We would therefore be very interested in something that works off the h5ad file, independent of the aligner.

I'll definitely take a look at the cellranger multiqc module, though, which we could consider adding in addition for more alignment-focused metrics.

drbecavin commented 2 years ago

I have just push the modificaton to add Kneeplot in the checkatlas report. You need checkatlas version 0.0.13 for that. And (for the moment) my forked version of MultiQC https://github.com/becavin-lab/MultiQC

You shoud obtain something like that:

Capture d’écran 2022-07-19 à 11 12 57

I am open to every suggestion for improvement.

grst commented 2 years ago

fantastic!

IMO log scale on the x axis (optionally also y axis) would make it a bit more readable!

drbecavin commented 2 years ago

Log scale has been added and will be available in next release of MultiQC !