broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

Plotting throws the wrong error when given reference fasta instead of dict #2941

Closed droazen closed 7 years ago

droazen commented 7 years ago

@achevali commented on Wed Feb 15 2017

It should state that the wrong file was given instead of trying to use it.

    jar:file:/dsde/working/aaronc/testing/LUAD/HMM_eval/gatk-protected/build/libs/gatk-protected-all-c17c8ed-SNAPSHOT-spark_standalone.jar!/com/intel/gkl/native/libIntelGKL.so
12:11:34.894 INFO  IntelGKLUtils - Intel GKL library loaded from classpath.
[February 15, 2017 12:11:34 PM EST] org.broadinstitute.hellbender.tools.exome.plotting.PlotACNVResults  --hets /dsde/working/aaronc/testing/LUAD/wgs/pulldown/TCGA-55-6972-01A-11D-1945-08.hets.tsv --tangentNormalized /dsde/working/aaronc/testing/LUAD/wgs/tumor_pcov/TCGA-55-6972-01A-11D-1945-08-gc-corrected.tn.tsv --segments out/wes/TCGA-55-6972-01A-11D-1945-08.seg --sequenceDictionaryFile /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta --outputPrefix HMM_eval.TCGA-55-6972-01A-11D-1945-08. --output out/wgs/  --minimumContigLength 1000000 --help false --version false --verbosity INFO --QUIET false --use_jdk_deflater false
[February 15, 2017 12:11:34 PM EST] Executing as aaronc@gsa5.broadinstitute.org on Linux 2.6.32-642.11.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Version: Version:c17c8ed-SNAPSHOT
12:11:34.899 INFO  PlotACNVResults - Defaults.BUFFER_SIZE : 131072
12:11:34.899 INFO  PlotACNVResults - Defaults.COMPRESSION_LEVEL : 5
12:11:34.899 INFO  PlotACNVResults - Defaults.CREATE_INDEX : false
12:11:34.899 INFO  PlotACNVResults - Defaults.CREATE_MD5 : false
12:11:34.899 INFO  PlotACNVResults - Defaults.CUSTOM_READER_FACTORY : 
12:11:34.899 INFO  PlotACNVResults - Defaults.EBI_REFERENCE_SERVICE_URL_MASK : http://www.ebi.ac.uk/ena/cram/md5/%s
12:11:34.899 INFO  PlotACNVResults - Defaults.NON_ZERO_BUFFER_SIZE : 131072
12:11:34.899 INFO  PlotACNVResults - Defaults.REFERENCE_FASTA : null
12:11:34.899 INFO  PlotACNVResults - Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
12:11:34.899 INFO  PlotACNVResults - Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:11:34.899 INFO  PlotACNVResults - Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
12:11:34.899 INFO  PlotACNVResults - Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:11:34.899 INFO  PlotACNVResults - Defaults.USE_CRAM_REF_DOWNLOAD : false
12:11:34.900 INFO  PlotACNVResults - Deflater IntelDeflater
12:11:34.900 INFO  PlotACNVResults - Initializing engine
12:11:34.900 INFO  PlotACNVResults - Done initializing engine
12:11:35.009 INFO  PlotACNVResults - Shutting down engine
[February 15, 2017 12:11:35 PM EST] org.broadinstitute.hellbender.tools.exome.plotting.PlotACNVResults done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1502085120
java.lang.IllegalArgumentException: There must be at least one contig above the threshold length in the sequence dictionary.
    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:673)
    at org.broadinstitute.hellbender.tools.exome.plotting.PlotACNVResults.doWork(PlotACNVResults.java:120)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:112)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
    at org.broadinstitute.hellbender.Main.instanceMain(Main.java:96)
    at org.broadinstitute.hellbender.Main.instanceMain(Main.java:103)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:116)
    at org.broadinstitute.hellbender.Main.main(Main.java:158)

@samuelklee commented on Wed Feb 22 2017

Hmm, perhaps I should not be using ReferenceUtils.loadFastaDictionary to load the dictionary, or this method needs to check that the file it is loading is indeed a valid .dict file?

@droazen, can you comment? The use case is that we are using a dictionary file to select regions and specify chromosome lengths for plotting. This dictionary file may not necessarily correspond to the reference fasta file used to generate the data being plotted (as it may have chromosomes that the user does not want to plot removed, for example), so we don't want to allow the user to pass the fasta file.


@samuelklee commented on Wed Apr 19 2017

@droazen can you chime in when you get a chance?


@droazen commented on Wed Apr 19 2017

@samuelklee ReferenceUtils.loadFastaDictionary() is fine to use for loading .dict files, even if a companion fasta file is not present. It is surprising that it doesn't throw if it is given a non-.dict input -- from a reading of the code, I suspect that it instead returns an empty sequence dictionary in that case. You could either have the caller check the return value of ReferenceUtils.loadFastaDictionary() to see if the returned dictionary is empty and throw if it is, or you could modify ReferenceUtils.loadFastaDictionary() itself to throw a UserException if the header returned from its SAMTextHeaderCodec contains no sequence dictionary.


@samuelklee commented on Wed Apr 19 2017

I think the responsibility is on the method. Filed https://github.com/broadinstitute/gatk/issues/2609.

jonn-smith commented 7 years ago

The fix as described by @droazen (i.e. checking if the returned header contains no dictionary) is actually included PR #2803.

droazen commented 7 years ago

Fixed by https://github.com/broadinstitute/gatk/pull/2803