genome-in-a-bottle / giab_latest_release

This repository contains information about latest release from Genome in a Bottle project
73 stars 5 forks source link

Latest NA12878 (v3.3.2) VCF doesn't Validate #5

Closed yfarjoun closed 6 years ago

yfarjoun commented 6 years ago

I'm getting the folowing error when running GATK ValidateVariants:

htsjdk.tribble.TribbleException: Line 2409097: there aren't enough columns for line chr11 (we expected 9 tokens, and saw 1 ), for input source: file:///seq/tng/tlangs/geneconcord/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz
    at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:281)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:262)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:64)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:365)
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:346)
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:307)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:152)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
    at org.broadinstitute.hellbender.Main.main(Main.java:275)

3.3.1 validates fine. What tool do you use to validate the vcf?

Thanks!

jzook commented 6 years ago

Thanks for the report! I've generally used bcftools to help ensuring correct formatting. This seems like a potential truncation issue - could you point me to the exact file on the ftp causing this issue? Maybe also try downloading it again in case it didn't download completely?

On Fri, Jan 19, 2018 at 10:46 AM Yossi Farjoun notifications@github.com wrote:

I'm getting the folowing error when running GATK ValidateVariants:

htsjdk.tribble.TribbleException: Line 2409097: there aren't enough columns for line chr11 (we expected 9 tokens, and saw 1 ), for input source: file:///seq/tng/tlangs/geneconcord/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:281) at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:262) at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:64) at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70) at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37) at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:365) at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:346) at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:307) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:152) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:275)

3.3.1 validates fine. What tool do you use to validate the vcf?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/genome-in-a-bottle/giab_latest_release/issues/5, or mute the thread https://github.com/notifications/unsubscribe-auth/ACU6dsP8-Vm9EUpHz3INFbhzihMN528Rks5tMLi4gaJpZM4RkqPk .

yfarjoun commented 6 years ago

sorry about the noise. bad download.