MultiQC / MultiQC

Aggregate results from bioinformatics analyses across many samples into a single report.
http://multiqc.info
GNU General Public License v3.0
1.22k stars 601 forks source link

DRAGEN changing name is not working #1865

Closed NicoleGruenheit closed 1 year ago

NicoleGruenheit commented 1 year ago

Description of bug

I'm trying to change the name of the default file pattern for dragen/wgs_contig_mean_cov using a config file:

sp:
  dragen/wgs_contig_mean_cov:
    fn: 'contig_mean_test.csv'

The error message is:

The last file found was: ./contig_mean_test.csv                                          │

 Traceback (most recent call last):                                                                                                     
   File "/usr/local/lib/python3.11/site-packages/multiqc/multiqc.py", line 654, in run                                                  
     output = mod()                                                                                                                     
   File "/usr/local/lib/python3.11/site-packages/multiqc/modules/dragen/dragen.py", line 89, in __init__                                
     samples_found |= self.add_coverage_per_contig()                                                                                    
   File "/usr/local/lib/python3.11/site-packages/multiqc/modules/dragen/coverage_per_contig.py", line 17, in add_coverage_per_contig    
     s_name, perchrom_data_by_phenotype = parse_wgs_contig_mean_cov(f)                                                                  
   File "/usr/local/lib/python3.11/site-packages/multiqc/modules/dragen/coverage_per_contig.py", line 167, in parse_wgs_contig_mean_cov 
     sample, phenotype = m.group(1), m.group(2)                                                                                         
 AttributeError: 'NoneType' object has no attribute 'group'

I think the problem is that in line 167 the pattern of the file is hard coded?

m = re.search(r"(.*).wgs_contig_mean_cov_?(tumor|normal)?.csv", f["fn"])

The file works if the filename ends with .wgs_contig_mean_cov.csv

Cheers, Nicole

File that triggers the error

contig_mean_test.csv

MultiQC Error log

docker run -u $UID_USER:$GID_BFX --rm -it -v `pwd`:`pwd` -w `pwd` ewels/multiqc:dev multiqc . -i S000021_S1163Nr32_ignore -f --config config1.txt

[2023-02-09 13:05:30,806] multiqc                                            [DEBUG  ]  This is MultiQC v1.15.dev0
[2023-02-09 13:05:30,810] multiqc                                            [DEBUG  ]  Loading config settings from: config1.txt
[2023-02-09 13:05:30,810] multiqc                                            [DEBUG  ]  New config: {'fn_ignore_files': ['*.time_metrics.csv', '*.trimmer_metrics.csv'], 'fn_ignore_dirs': ['S000021_S1163Nr32*']}
[2023-02-09 13:05:30,810] multiqc                                            [DEBUG  ]  Added to filename patterns: [{'dragen/wgs_contig_mean_cov': {'fn': 'contig_mean_test.csv'}}]
[2023-02-09 13:05:30,810] multiqc                                            [DEBUG  ]  Command used: /usr/local/bin/multiqc . -i S000021_S1163Nr32_ignore -f --config config1.txt
[2023-02-09 13:05:30,963] multiqc                                            [DEBUG  ]  Latest MultiQC version is v1.14
[2023-02-09 13:05:30,963] multiqc                                            [DEBUG  ]  Working dir : /data/persistent/projects/Kundenprojekte/S1163/S1163_0/DNA_TwistCorePlus_Hg96_GRCh38/dragen_exome_test
[2023-02-09 13:05:30,963] multiqc                                            [DEBUG  ]  Template    : default
[2023-02-09 13:05:30,963] multiqc                                            [DEBUG  ]  Running Python 3.11.1 (main, Feb  4 2023, 11:23:15) [GCC 10.2.1 20210110]
[2023-02-09 13:05:30,963] multiqc                                            [INFO   ]  Report title: S000021_S1163Nr32_ignore
[2023-02-09 13:05:30,964] multiqc                                            [DEBUG  ]  Analysing modules: custom_content, ccs, ngsderive, purple, conpair, lima, peddy, somalier, methylQA, mosdepth, phantompeakqualtools, qualimap, preseq, hifiasm, quast, qorts, rna_seqc, rockhopper, rsem, rseqc, busco, bustools, goleft_indexcov, gffcompare, disambiguate, supernova, deeptools, sargasso, verifybamid, mirtrace, happy, mirtop, sambamba, gopeaks, homer, hops, macs2, theta2, snpeff, gatk, htseq, bcftools, featureCounts, fgbio, dragen, dragen_fastqc, dedup, pbmarkdup, damageprofiler, biobambam2, jcvi, mtnucratio, picard, vep, sentieon, prokka, qc3C, nanostat, samblaster, samtools, sexdeterrmine, eigenstratdatabasetools, bamtools, jellyfish, vcftools, longranger, stacks, varscan2, snippy, umitools, bbmap, bismark, biscuit, diamond, hicexplorer, hicup, hicpro, salmon, kallisto, slamdunk, star, hisat2, tophat, bowtie2, bowtie1, cellranger, snpsplit, odgi, pangolin, nextclade, humid, kat, leehom, adapterRemoval, bbduk, clipandmerge, cutadapt, flexbar, kaiju, kraken, malt, motus, trimmomatic, sickle, skewer, sortmerna, biobloomtools, fastq_screen, afterqc, fastp, fastqc, filtlong, prinseqplusplus, pychopper, porechop, pycoqc, minionqc, anglerfish, multivcfanalyzer, clusterflow, checkqc, bcl2fastq, bclconvert, interop, ivar, flash, seqyclean, optitype, whatshap
[2023-02-09 13:05:30,964] multiqc                                            [DEBUG  ]  Using temporary directory for creating report: /tmp/tmpp3sghq13
[2023-02-09 13:05:31,240] multiqc                                            [INFO   ]  Search path : xxxx
[2023-02-09 13:05:50,241] multiqc                                            [DEBUG  ]  Summary of files that were skipped by the search: [skipped_file_contents_search_errors: 1009] // [skipped_module_specific_max_filesize: 211] // [skipped_no_match: 54] // [skipped_filesize_limit: 4] // [skipped_directory_fn_ignore_dirs: 4] // [skipped_ignore_pattern: 2]
[2023-02-09 13:05:51,646] multiqc.plots.bargraph                             [DEBUG  ]  Using matplotlib version 3.6.3
[2023-02-09 13:05:51,652] multiqc.plots.linegraph                            [DEBUG  ]  Using matplotlib version 3.6.3
[2023-02-09 13:05:51,654] multiqc                                            [DEBUG  ]  No samples found: custom_content
[2023-02-09 13:05:51,857] multiqc                                            [DEBUG  ]  Oops! The 'dragen' MultiQC module broke...
================================================================================
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/multiqc/multiqc.py", line 654, in run
    output = mod()
             ^^^^^
  File "/usr/local/lib/python3.11/site-packages/multiqc/modules/dragen/dragen.py", line 89, in __init__
    samples_found |= self.add_coverage_per_contig()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/multiqc/modules/dragen/coverage_per_contig.py", line 17, in add_coverage_per_contig
    s_name, perchrom_data_by_phenotype = parse_wgs_contig_mean_cov(f)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/multiqc/modules/dragen/coverage_per_contig.py", line 167, in parse_wgs_contig_mean_cov
    sample, phenotype = m.group(1), m.group(2)
                        ^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
================================================================================
[2023-02-09 13:05:51,864] multiqc.plots.boxplot                              [DEBUG  ]  Using matplotlib version 3.6.3
[2023-02-09 13:05:54,585] multiqc.modules.dragen_fastqc.dragen_fastqc        [INFO   ]  Found 1 reports
[2023-02-09 13:05:54,596] multiqc                                            [INFO   ]  Compressing plot data
[2023-02-09 13:05:54,669] multiqc                                            [WARNING]  Deleting    : S000021_S1163Nr32_ignore_multiqc_report.html   (-f was specified)
[2023-02-09 13:05:54,676] multiqc                                            [WARNING]  Deleting    : S000021_S1163Nr32_ignore_multiqc_report_data   (-f was specified)
[2023-02-09 13:05:54,713] multiqc                                            [INFO   ]  Report      : S000021_S1163Nr32_ignore_multiqc_report.html
[2023-02-09 13:05:54,713] multiqc                                            [INFO   ]  Data        : S000021_S1163Nr32_ignore_multiqc_report_data
[2023-02-09 13:05:54,713] multiqc                                            [DEBUG  ]  Moving data file from '/tmp/tmpp3sghq13/multiqc_data' to '/xxxx/S000021_S1163Nr32_ignore_multiqc_report_data'
[2023-02-09 13:05:54,946] multiqc                                            [INFO   ]  MultiQC complete
[2023-02-09 13:05:54,946] multiqc                                            [WARNING]  1 flat-image plot used in the report due to large sample numbers

Before submitting

Just-Roma commented 1 year ago

Hi @NicoleGruenheit, not sure if you still need a solution for this problem, but if you do, then you can try the ones below.

Solutions:

My 5 cents:

vladsavelyev commented 1 year ago

It's a bit tricky to use a custom search pattern for certain Dragen modules, as they are relying on the file name to distinguish the phenotype of the sample for paired somatic calling, i.e. tumor or normal is encoded in the suffix (.*).wgs_contig_mean_cov_?(tumor|normal)?.csv.

Would be interesting to explore a robust solution to this, but for now unfortunately changing file names for Dragen wouldn't work.