Open rdocking opened 6 years ago
@rdocking Very strange. Did it produce any stack trace in the output? It sounds like maybe executors are dying and something isn't retrying correctly in some way.
@lbergelson - I didn't see any stack trace in the output. Here are examples from similar samples:
gatk_debug_60k.txt - Runs properly gatk_debug_70k.txt - Malformed output
Hi there - I'm having some problems running
HaplotypeCallerSpark
on RNA-Seq data.The tl;dr is that, on some occasions when
HaplotypeCallerSpark
runs out of memory, it finishes successfully, but writes out a VCF file without a proper header.Example command syntax is:
When I run this command on a single chromosome with
-Xmx94349m
, the command completes successfully, but the resulting VCF header does not contain this expected header line:(along with most of the other header lines associated with gVCF output). When I up the memory request to 110g for the same input files, the proper VCF header is present.
I discovered this in the context of running GATK within the bcbio pipeline, the original descriptions are at: https://github.com/bcbio/bcbio-nextgen/issues/2375
On the linked issue, I have examples of GATK output from runs that produced correct and incorrect output - please let me know if there's any other information you need. Thanks!