broadinstitute / gatk-protected

Obsolete/Legacy GATK repository -- go to https://github.com/broadinstitute/gatk instead
BSD 3-Clause "New" or "Revised" License
33 stars 20 forks source link

IllegalArgumentException in HaplotypeCallerSpark #1091

Closed tomwhite closed 7 years ago

tomwhite commented 7 years ago

When running on a 160GB BAM I get:

java.lang.IllegalArgumentException: contig must be non-null and not equal to *, and start must be >= 1
     at org.broadinstitute.hellbender.utils.read.SAMRecordToGATKReadAdapter.setPosition(SAMRecordToGATKReadAdapter.java:89)
     at org.broadinstitute.hellbender.utils.clipping.ClippingOp.applyHARDCLIP_BASES(ClippingOp.java:381)
     at org.broadinstitute.hellbender.utils.clipping.ClippingOp.apply(ClippingOp.java:73)
     at org.broadinstitute.hellbender.utils.clipping.ReadClipper.clipRead(ReadClipper.java:147)
     at org.broadinstitute.hellbender.utils.clipping.ReadClipper.clipRead(ReadClipper.java:128)
     at org.broadinstitute.hellbender.utils.clipping.ReadClipper.hardClipSoftClippedBases(ReadClipper.java:332)
     at org.broadinstitute.hellbender.utils.clipping.ReadClipper.hardClipSoftClippedBases(ReadClipper.java:335)
     at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.finalizeRegion(AssemblyBasedCallerUtils.java:84)
     at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:238)
     at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:478)
     at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.lambda$regionToVariants$580(HaplotypeCallerSpark.java:203) 

At first glance this looks like a problem with unmapped reads, but these are filtered out by the tool. So it's more likely to be in the clipping logic. It's hard to diagnose since it doesn't say which read caused it, and it's slow to reproduce as it is running on a large input.

Any thoughts @lbergelson, @droazen?

lbergelson commented 7 years ago

@tomwhite Is it possible you could upload the bam file somewhere on google cloud along with the command line you used? It's not obvious to me where the error is being caused. It's painful to debug anything on a 160GB file, but I think we can probably do a binary search on the file and find the bad location pretty quickly. I.e. throw compute at the problem instead of human time...

droazen commented 7 years ago

Issue moved to broadinstitute/gatk #3013 via ZenHub