eastgenomics / trendyQC

Django app for monitoring trends in MultiQC data
MIT License
0 stars 0 forks source link

GC bias data missing from CEN runs #101

Open Yu-jinKim opened 1 week ago

Yu-jinKim commented 1 week ago

DISCLAIMER

This could affect other metrics in other assays

Description

Adriana sent me a request to plot some GC coverage data for a recent CEN run. While attempting to plot, i got errors and looking more in depth in the multiqc_data.json file for a random CEN run, i found out that GC bias data is not stored in the report_saved_raw_data key which i used to define which tools are present for each assay.

Potential fixes

Instead GC bias is stored in the report_plot_data key. A few options can be considered to fix this:

Yu-jinKim commented 2 days ago

Basically MultiQC doesn't save the data per sample because the code goes in https://github.com/MultiQC/MultiQC/blob/f0e29ea3922506383c431888282a58c1af7f98af/multiqc/modules/picard/GcBiasMetrics.py#L55 due to the header in the per sample file for GC bias which doesn't allow the summary_data_by_sample to get the data needed. As a consequence the writing of data is not triggered.

Yu-jinKim commented 2 days ago

MultiQC v1.14 did not have complete support for Sentieon:

This caused the GCBias Summary file to not be captured by MultiQC.

MultiQC v1.18 remedied this: https://github.com/MultiQC/MultiQC/releases/tag/v1.18

Yu-jinKim commented 1 day ago

I have attempted to use the latest Docker image available to see if it solved the issue. This resulted in the following modifications to be made:

After all that, I still have issues as the samples are not correctly recognized by the Sentieon tools: https://platform.dnanexus.com/panx/projects/GkyfKF8481PpyVvXQ3FGkFq4/data/?scope=project&id.values=file-Gp220Gj4501Ypzx60f6J7Ggq I do capture some Sentieon files but not all. So i suspect some MultiQC config shenanigans.

Yu-jinKim commented 5 hours ago

Decision taken on 03/07/24: For v1 of TrendyQC data, go ahead with the missing GC bias data. This will be re-evaluated for a subsequent version.