bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
991 stars 354 forks source link

tools_off: gatk4 incompatible with svcaller: gatk-cnv #2847

Closed matthdsm closed 5 years ago

matthdsm commented 5 years ago

Hi,

When trying to run bcbio using both the tools_off: gatk4 and svcaller: gatk-cnv, the pipeline fails with the following error:

[2019-06-05T10:03Z] multiprocessing: calculate_sv_bins
[2019-06-05T10:03Z] GATK: PreprocessIntervals
[2019-06-05T10:03Z] ##### ERROR ------------------------------------------------------------------------------------------
[2019-06-05T10:03Z] ##### ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
[2019-06-05T10:03Z] ##### ERROR
[2019-06-05T10:03Z] ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
[2019-06-05T10:03Z] ##### ERROR The error message below tells you what is the problem.
[2019-06-05T10:03Z] ##### ERROR
[2019-06-05T10:03Z] ##### ERROR If the problem is an invalid argument, please check the online documentation guide
[2019-06-05T10:03Z] ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
[2019-06-05T10:03Z] ##### ERROR
[2019-06-05T10:03Z] ##### ERROR Visit our website and forum for extensive documentation and answers to
[2019-06-05T10:03Z] ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
[2019-06-05T10:03Z] ##### ERROR
[2019-06-05T10:03Z] ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
[2019-06-05T10:03Z] ##### ERROR
[2019-06-05T10:03Z] ##### ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: PreprocessIntervals
[2019-06-05T10:03Z] ##### ERROR ------------------------------------------------------------------------------------------
[2019-06-05T10:03Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/data/gent/vo/000/gvo00082/bcbio/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/data/gent/vo/000/gvo00082/bcbio/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; unset JAVA_HOME && export PATH=/vscmnt/gent_kyukon_data/_kyukon_data_gent/vo/000/gvo00082/bcbio/anaconda/bin:"$PATH" && /vscmnt/gent_kyukon_data/_kyukon_d
ata_gent/vo/000/gvo00082/bcbio/anaconda/bin/gatk3 -Xms500m -Xmx29484m -XX:+UseSerialGC -Djava.io.tmpdir=/scratch/gent/vo/000/gvo00082/vsc41443/bcbio/tmpwpq6yfkr -T PreprocessIntervals -R /kyukon/data/gent/vo/000
/gvo00082/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa --interval-merging-rule OVERLAPPING_ONLY -O /scratch/gent/vo/000/gvo00082/vsc41443/bcbio/tmpwpq6yfkr/D1518155-target.interval_list -L /vscmnt/gent_kyukon_data/_k
yukon_data_gent/vo/000/gvo00082/vsc41443/bcbio_cnv/samples_cnv-merged/work/bedprep/cleaned-RefSeqExomeAndPanels_20171003.bed --bin-length 0 --padding 250 -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_f
ilter NotPrimaryAlignment

Which is kind of obvious, since GATK3 doesn't support CNV calling. However, I was under the impression that the tools_off: gatk4 option only affected the short variant calling.

Do you think it's viable to make bcbio "ignore" the tools_off for gatk-cnv? The requirement for gatk-cnv implies the usage af gatk4, but doens't mean the user want the short variants called with GATK4.

Thanks for discussing.

Cheers M

chapmanb commented 5 years ago

Matthias; Thank you for catching this problem. The latest development version now ignores tools_off since that doesn't apply to GATK CNV calling. I appreciate the detailed report and suggestion.