brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

vcfanno fails to parse header of bcf file #154

Open edg1983 opened 1 year ago

edg1983 commented 1 year ago

Hello,

I'm using vcfanno 0.3.3 to annotate a bcf file that has undergone several previous processing steps, including bcftools fill-tags, bcftools filter, bcftools csq.

I'm using a toml config file that worked fine before so I assume this is OK.

Essentially, it seems vcfanno has problems parsing the header of the input bcf as you can see from the error log below

=============================================
vcfanno version 0.3.3 [built with go1.16.5]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:116: found 30 sources from 12 files
vcfanno.go:157: falling back to non-bgzip
vcfanno.go:164: error parsing VCF query file molisani_cohort.PASS.snpEff.bcf: FILTER error: ##FILTER=<ID=PASS,Description="All filters passed",IDX=0>. [line: 2]
INFO error: ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency estimate for each alternate allele",IDX=1>, []. [line: 7]
INFO error: ##INFO=<ID=AQ,Number=A,Type=Integer,Description="Allele Quality score reflecting evidence for each alternate allele (Phred scale)",IDX=2>, []. [line: 8]
INFO error: ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes",IDX=3>, []. [line: 9]
INFO error: ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes",IDX=4>, []. [line: 10]
FILTER error: ##FILTER=<ID=MONOALLELIC,Description="Site represents one ALT allele in a region with multiple variants that could not be unified into non-overlapping multi-allelic sites",IDX=5>. [line: 11]
FORMAT error: ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype",IDX=6>. [line: 12]
FORMAT error: ##FORMAT=<ID=RNC,Number=2,Type=Character,Description="Reason for No Call in GT: . = n/a, M = Missing data, P = Partial data, I = gVCF input site is non-called, D = insufficient Depth of coverage, - = unrepresentable overlapping deletion, L = Lost/unrepresentable allele (other than deletion), U = multiple Unphased variants present, O = multiple Overlapping variants present, 1 = site is Monoallelic, no assertion about presence of REF or ALT allele",IDX=7>. [line: 13]
FORMAT error: ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)",IDX=8>. [line: 14]
FORMAT error: ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed",IDX=9>. [line: 15]
FORMAT error: ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality",IDX=10>. [line: 16]
FORMAT error: ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled genotype Likelihoods",IDX=11>. [line: 17]
INFO error: ##INFO=<ID=MULTIALLELIC_INDEL,Number=0,Type=Flag,Description="Variant is part of a multi-allelic variant including at least one indel",IDX=12>, []. [line: 2598]
INFO error: ##INFO=<ID=MULTIALLELIC_SNV,Number=0,Type=Flag,Description="Variant is part of a multi-allelic variant including only SNVs",IDX=13>, []. [line: 2599]
FORMAT error: ##FORMAT=<ID=AB,Number=A,Type=Float,Description="Allele balance for the ALT allele",IDX=14>. [line: 2602]
INFO error: ##INFO=<ID=F_MISSING,Number=.,Type=Float,Description="Added by +fill-tags expression F_MISSING=F_MISSING",IDX=15>, []. [line: 2603]
INFO error: ##INFO=<ID=median_GQ,Number=.,Type=Float,Description="Added by +fill-tags expression median_GQ=MEDIAN(GQ)",IDX=16>, []. [line: 2604]
INFO error: ##INFO=<ID=median_DP,Number=.,Type=Float,Description="Added by +fill-tags expression median_DP=MEDIAN(FMT/DP)",IDX=17>, []. [line: 2605]
INFO error: ##INFO=<ID=nhomalt,Number=.,Type=Float,Description="Added by +fill-tags expression nhomalt=COUNT(GT=\"AA\")",IDX=18>, []. [line: 2606]
INFO error: ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data",IDX=19>, []. [line: 2607]
INFO error: ##INFO=<ID=MAF,Number=1,Type=Float,Description="Frequency of the second most common allele",IDX=20>, []. [line: 2608]
INFO error: ##INFO=<ID=HWE,Number=A,Type=Float,Description="HWE test (PMID:15789306); 1=good, 0=bad",IDX=21>, []. [line: 2609]
INFO error: ##INFO=<ID=TYPE,Number=.,Type=String,Description="Variant type",IDX=22>, []. [line: 2610]
INFO error: ##INFO=<ID=ExcHet,Number=A,Type=Float,Description="Test excess heterozygosity; 1=good, 0=bad",IDX=23>, []. [line: 2611]
FILTER error: ##FILTER=<ID=lowAQ,Description="Set if true: AQ < 20",IDX=24>. [line: 2614]
FILTER error: ##FILTER=<ID=noHQvars,Description="Set if true: N_PASS(GQ >= 20 & FMT/DP >= 10) == 0",IDX=25>. [line: 2617]
FILTER error: ##FILTER=<ID=highMissing,Description="Set if true: F_MISSING > 0.9",IDX=26>. [line: 2619]
FILTER error: ##FILTER=<ID=lowGQ,Description="Set if true: median_GQ < 10",IDX=27>. [line: 2621]
FILTER error: ##FILTER=<ID=SNVlowDP,Description="Set if true: TYPE == \"snp\" && median_DP < 6",IDX=28>. [line: 2623]
FILTER error: ##FILTER=<ID=INDELlowDP,Description="Set if true: TYPE == \"indel\" && median_DP < 10",IDX=29>. [line: 2625]
FILTER error: ##FILTER=<ID=NoAltCalls,Description="Set if true: AC == 0",IDX=30>. [line: 2627]
INFO error: ##INFO=<ID=BCSQ,Number=.,Type=String,Description="Local consequence annotation from BCFtools/csq, see http://samtools.github.io/bcftools/howtos/csq-calling.html for details. Format: Consequence|gene|transcript|biotype|strand|amino_acid_change|dna_change",IDX=31>, []. [line: 2633]
FORMAT error: ##FORMAT=<ID=BCSQ,Number=.,Type=Integer,Description="Bitmask of indexes to INFO/BCSQ, with interleaved first/second haplotype. Use \"bcftools query -f'[%CHROM\t%POS\t%SAMPLE\t%TBCSQ\n]'\" to translate.",IDX=31>. [line: 2634]

I've attached the actual header of the file if this can help you understand the issue.

Thanks a lot!

header.txt

brentp commented 1 year ago

Hi @edg1983 , vcfanno doesn't support BCF. I should add a better message for that, but it will resolve your issue for now.

edg1983 commented 1 year ago

Oh I didn't realized that! I got confused by the error message.

Easy fix then! ;)

Thanks!