apeltzer / VCF2Genome

A tool to create a draft genome file out of a GATK VCF file
GNU General Public License v3.0
0 stars 0 forks source link

Unable to parse header with error #6

Closed idolawoye closed 5 years ago

idolawoye commented 5 years ago

I am trying to run vcf2genome on a vcf file that was produced by GATK using the eager pipeline and it produced this error:

Exception in thread "main" htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: file:///home/idowu/Downloads/Bioinformatics/Daccor/Releases/test/results/RAW/11-GATKVariantFilter/SRR3584843_R1.fastq.merged.fq.mappedonly.sorted.cleaned_rmdup.sorted.real.unifiedgenotyper.filtered.vcf
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:262)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:101)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:126)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:110)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:74)
    at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:117)
    at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:68)
    at VCF2Genome.isVCF41(VCF2Genome.java:445)
    at VCF2Genome.<init>(VCF2Genome.java:146)
    at VCF2Genome.main(VCF2Genome.java:129)
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:115)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:260)
    ... 9 more

However, my VCF file looks okay and has the #CHROM header as you can see in the attachment. Kindly help out. Thanks :pensive:

SRR3584843_R1.fastq.merged.fq.mappedonly.sorted.cleaned_rmdup.sorted.real.unifiedgenotyper.filtered.vcf.gz](https://github.com/apeltzer/VCF2Genome/files/2888739/SRR3584843_R1.fastq.merged.fq.mappedonly.sorted.cleaned_rmdup.sorted.real.unifiedgenotyper.filtered.vcf.gz)

apeltzer commented 5 years ago

Hi!

This is probably an upstream problem in the htsjdk library as it can't parse the chromosome name appropriately:

NC_000919.1Treponemapallidumsubsp.pallidumstr.Nicholscompletegenome_pos_[134890,151090]_length_1845

is a bit long for a single chromosome name. Can you please use a FastA reference with shortened chromosome names?

idolawoye commented 5 years ago

Hello, Sorry to bother once again. I encountered this error whilst using EAGERCLI to process a config file.

#Exception in thread "main" java.io.IOException: Error: Base calls in the vcf file are not sorted! (Note that we currently don't support multiple chromosomes, too!
    at VCF2Genome.runUGAnalysis(VCF2Genome.java:209)
    at VCF2Genome.<init>(VCF2Genome.java:148)
    at VCF2Genome.main(VCF2Genome.java:129)
# The Module VCF2Genome failed in execution at 2019-03-04T14:24:06.613. Check what happened in the logfile.

Kindly assist me with this error. Thanks

apeltzer commented 5 years ago
 Error: Base calls in the vcf file are not sorted! (Note that we currently don't support multiple chromosomes, too!

Do you have more than a single FastA entry in your input reference?

idolawoye commented 5 years ago

Yes I have two plasmids with the complete genome in the reference file

apeltzer commented 5 years ago

Well, this isn't supported and since I don't really develop this tool anymore, I won't add that now anymore.

idolawoye commented 5 years ago

Thanks a lot.

apeltzer commented 5 years ago

Both sorry and you're welcome 👍