eastgenomics / trendyQC

Django app for monitoring trends in MultiQC data
MIT License
0 stars 0 forks source link

Het-hom data not captured #114

Open Yu-jinKim opened 1 month ago

Yu-jinKim commented 1 month ago

002_240524_A01303_0393_AH7YWCDRX5_CEN doesn't have multiqc_het-hom_analysis in report_saved_raw_data as expected. Instead it has multiqc_het-hom_table. No idea why couldn't find any indication in the multiqc config for CEN of a change

Yu-jinKim commented 1 month ago

240524_393:

dx cat file-Gk8vPyj4ppB5B49Yvxg0b3bY | jq ".report_saved_raw_data | keys"
[
  "multiqc_bcl2fastq_bylane",
  "multiqc_bcl2fastq_bysample",
  "multiqc_fastqc",
  "multiqc_general_stats",
  "multiqc_happy_indel_data",
  "multiqc_happy_snp_data",
  "multiqc_het-hom_table",
  "multiqc_picard_HsMetrics",
  "multiqc_samtools_flagstat",
  "multiqc_sentieon_AlignmentSummaryMetrics",
  "multiqc_sentieon_insertSize",
  "multiqc_somalier_table",
  "multiqc_verifybamid",
  "picard_histogram",
  "picard_histogram_1",
  "picard_histogram_2"
]

240415_352:

dx cat file-GjPBQKQ42gy47gVxqpQvz5zg | jq '.report_saved_raw_data | keys'
[
  "multiqc_bcl2fastq_bylane",
  "multiqc_bcl2fastq_bysample",
  "multiqc_fastqc",
  "multiqc_general_stats",
  "multiqc_happy_indel_data",
  "multiqc_happy_snp_data",
  "multiqc_het-hom_analysis",
  "multiqc_picard_HsMetrics",
  "multiqc_samtools_flagstat",
  "multiqc_sentieon_AlignmentSummaryMetrics",
  "multiqc_sentieon_insertSize",
  "multiqc_somalier_sex_check",
  "multiqc_verifybamid"
]

Somalier had its name changed as well

Yu-jinKim commented 1 month ago

Comparing the config files used in those jobs:

diff -sy --suppress-common-lines <(dx cat file-Gj81GKQ46f9Kgb0fPB56Jzkk) <(dx cat file-GV55qpj47f086YFQb47B5Gg4)
        info: "detects sample contamination."             |         info: "detects sample contamination"
    picard_hsmetrics_table: # Specify the order in which tabl <
        BAIT_DESIGN_EFFICIENCY: 780               <
        FOLD_80_BASE_PENALTY: 790                 <
        FOLD_ENRICHMENT: 800                      <
        HET_SNP_Q: 810                        <
        HET_SNP_SENSITIVITY: 820                  <
        MAX_TARGET_COVERAGE: 830                  <
        MEAN_BAIT_COVERAGE: 840                   <
        MEAN_TARGET_COVERAGE: 850                 <
        MEDIAN_TARGET_COVERAGE: 860               <
        NEAR_BAIT_BASES: 870                      <
        OFF_BAIT_BASES: 880                   <
        ON_BAIT_BASES: 890                    <
        ON_BAIT_VS_SELECTED: 900                  <
        ON_TARGET_BASES: 910                      <
        PCT_USABLE_BASES_ON_BAIT: 920                 <
        PCT_USABLE_BASES_ON_TARGET: 930               <
        PF_BASES_ALIGNED: 940                     <
        PF_READS: 950                         <
        PF_UNIQUE_READS: 960                      <
        PF_UQ_BASES_ALIGNED: 970                  <
        PF_UQ_READS_ALIGNED: 980                  <
        ZERO_CVG_TARGETS_PCT: 990                 <
        PCT_SELECTED_BASES: 1000                  <
        - 20                        # for conditional table f |         - 20                           # for conditional tabl
                                  <
                                  <
custom_table_header_config:                   <
    picard_hsmetrics_table:                   <
        FOLD_80_BASE_PENALTY:                     <
            format: "{:,.3f}"

The naming of the key in the report_saved_raw_data is done here: https://github.com/MultiQC/MultiQC/blob/v1.14/multiqc/modules/custom_content/custom_content.py#L367 and it looks like it uses the pconfig.id to name the key. The 2 config file are identical in their pconfig.id which doesn't make sense since the keys are different in the JSON file

Yu-jinKim commented 1 month ago

Previously, we were using v1.11 of MultiQC: file-GF3PxgQ433Gqv1Q029Gjzjfv The docker image for v1.14 of MultiQC was created in March 27th 2024 in 001_Reference: file-Gj22pvQ433Gk8Y6JJXy5J73Y v1.11 was using the section name to name the output: https://github.com/MultiQC/MultiQC/blob/v1.11/multiqc/modules/custom_content/custom_content.py#L345 and https://github.com/MultiQC/MultiQC/blob/v1.11/multiqc/modules/custom_content/custom_content.py#L363 v1.14 uses the pconfig.id to name the output: https://github.com/MultiQC/MultiQC/blob/v1.14/multiqc/modules/custom_content/custom_content.py#L367 Hence the discrepancy.