brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

Vcfanno does not typecast fields correctly when using the by_alt op #113

Closed ptn24 closed 5 years ago

ptn24 commented 5 years ago

According to https://github.com/brentp/vcfanno#typecasting-values, it should be possible to typecast fields by adding a _float suffix to the field names. However, when using the by_alt op, the annotated VCF fields do not have the desired type, and the _float suffixes are not removed

Op: self Field name: good Field number: bad Field type: float

root@b3cca58b784e:/tmp# cat conf.toml 
[[annotation]]
names = [ "CADD_RAW_float",]
file = "/tmp/annotation.tsv.gz"
columns = [ 5,]
ops = [ "self",]
root@b3cca58b784e:/tmp# vcfanno conf.toml test.vcf.gz

=============================================
vcfanno version 0.3.1 [built with go1.11]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 1 sources from 1 files
vcfanno.go:143: using 2 worker threads to decompress query file
api.go:804: WARNING: using op 'self' when with Number='1' for '' from '/tmp/annotation.tsv.gz' can result in out-of-order values when the query is multi-allelic
api.go:805:        : this is not an issue if the query has been decomposed.
##fileformat=VCFv4.2
##contig=<ID=chr2,length=242193529,assembly=GRCh38>
##INFO=<ID=AF,Number=.,Type=Float,Description="">
##INFO=<ID=AQ,Number=.,Type=Integer,Description="">
##INFO=<ID=CADD_RAW,Number=1,Type=Float,Description="calculated by self of overlapping values in column 5 from /tmp/annotation.tsv.gz">
##hailversion=0.2.9-8588a25687af
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
chr2    41647   2_41647_A_G     A       G       1328.0  .       AF=1.56250e-02;AQ=1328;CADD_RAW=0.591814
vcfanno.go:241: annotated 1 variants in 0.00 seconds (2292.6 / second)

Op: by_alt Field name: bad Field number: good Field type: string

root@b3cca58b784e:/tmp# cat conf.toml 
[[annotation]]
names = [ "CADD_RAW_float",]
file = "/tmp/annotation.tsv.gz"
columns = [ 5,]
ops = [ "by_alt",]
root@b3cca58b784e:/tmp# vcfanno conf.toml test.vcf.gz

=============================================
vcfanno version 0.3.1 [built with go1.11]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 1 sources from 1 files
vcfanno.go:143: using 2 worker threads to decompress query file
##fileformat=VCFv4.2
##contig=<ID=chr2,length=242193529,assembly=GRCh38>
##INFO=<ID=AF,Number=.,Type=Float,Description="">
##INFO=<ID=AQ,Number=.,Type=Integer,Description="">
##INFO=<ID=CADD_RAW_float,Number=A,Type=String,Description="calculated by by_alt of overlapping values in column 5 from /tmp/annotation.tsv.gz">
##hailversion=0.2.9-8588a25687af
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
chr2    41647   2_41647_A_G     A       G       1328.0  .       AF=1.56250e-02;AQ=1328;CADD_RAW_float=0.591814
vcfanno.go:241: annotated 1 variants in 0.00 seconds (3546.0 / second)

It would be good if the following was true:


brentp commented 5 years ago

thanks for the clear report. i'll see if i can get a fix in shortly

brentp commented 5 years ago

Hi, this was an easy fix. If you want you can try the (linux) binary attached here. And vcfanno_dev.gz

I should have a release out before august.

ptn24 commented 5 years ago

Swift response. Thank you, @brentp!

brentp commented 5 years ago

this is out in new release.

ptn24 commented 5 years ago

Thank you, @brentp. I verified vcfanno 0.3.2

brentp commented 5 years ago

cheers. thanks for following up and for providing the great test-case.