Open naumenko-sa opened 5 years ago
I have pushed a fix for this. Would you try the attached binary? vcfanno_linux64_fix.gz
you'll need to set
fields=["POPMAX_AF"]
and then you'll probably want -permissive-overlap
or it will only get exact matches.
Thanks @brentp !
This binary works. However, even with -permissive-overlap option on, annotation happens only when POS in the query is the same as POS in the GNOMAD. When I tried to annotate an SV [158,000-166,000] with gnomad [157,000-166,000] it would not annotate.
Sergey
don't use self
.
On Mon, Mar 18, 2019 at 1:59 PM Sergey Naumenko notifications@github.com wrote:
Thanks @brentp https://github.com/brentp !
This binary works. However, even with -permissive-overlap option on, annotation happens only when POS in the query is the same as POS in the GNOMAD. When I tried to annotate an SV [158,000-166,000] with gnomad [157,000-166,000] it would not annotate.
Sergey
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/brentp/vcfanno/issues/106#issuecomment-474077554, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAGy1ClEmgvHyISdD_bXMWXJqvwidZRks5vX_A7gaJpZM4b6ZzQ .
Thanks @brentp!
I've got it - when working with intervals, it is better to annotate a vcf file vs bed file rather than vs vcf. Still I have some minor questions and suggestions, if you don't mind.
My new config
[[annotation]]
file="gnomad_sv.bed.gz"
columns=[39]
names=["gnomad_sv_popmax_af"]
ops=["self"]
Probably, before using gnomad_sv for annotation we have to 'normalize' gnomad_SV database like we did for small variants. However, SVs are large, and there are rare deletions overlapping with frequent duplications in the population. What is your take on this problem? Would you suggest to split gnomad bed file into several files according to types (DEL, DUP) and annotate separately?
A test of sensitivity of vcfanno: I picked up a deletion in GNOMAD which did not overlap with anything: 1 50941 51053 gnomAD_v2_DEL_1_3
Do you think that the last overlap is too permissive and something like % required reciprocal overlap could be added here in vcfanno like in bedtools? I'd happy to contribute here, just point me out to the right place. Another suggestion might be a flag to use an additional INFO field when matching (i.e. SVTYPE in vcf and a certain column in BED).
Or maybe it is possible to allow users to custom calculation on fields before annotation happens, i.e. to define what is matching with lua as it happens with post-annotation?
Thanks! Sergey
Hi, if I understand correctly, you are describing what appear to be bugs in vcfanno--variants that should be getting annotated, but are not. Is that correct? Are you sure both files are sorted and that the file you are annotating with is indexed correctly? Can you share a small query and annotation file and conf where it does not annotate as expected?
Hi @brentp !
I see two issues. In both annotation happens. One issue is incorrect annotation, another is too permissive annotation (better to have no annotation there). To reproduce both issues you may use these files:
vcf (just one variant): test.vcf.gz
bed (just one record): gnomad.bed.gz
config:
[[annotation]]
file="gnomad.bed.gz"
columns=[39]
names=["gnomad_sv_popmax_af"]
ops=["self"]
Command: vcfanno vcfanno_bed.conf test.vcf.gz | bgzip -c > test.annotated.vcf.gz
Works the same way with and without -permissive-overlap, and vcfanno 0.3.1 and 0.3.2.
Sergey
I guess the problem is related to finding reciprocal overlap and currently most of the existing methods dont deal with it. Has this been resolved yet in vcfanno? Eagerly waiting for it.
I am having similar issue. Did someone has an update on this?
hi, I won't have time to work on this. I think the machinery is in place inside vcfanno for the annotation which can be followed up with some post-processing to require a certain amount of overlap.
Hello, Brent!
Thanks for the very useful vcfanno tool!
I'd like to start using it for annotation of structural variants (SV) in the same way how it works for small variant vcf files.
Recently GNOMAD released its SV frequencies, and I gave it a try:
no lua
1 157000 gnomAD_v2_DUP_1_8 N <DUP> 999 PASS END=166000
vcfanno crg.sv.vcfanno.conf test.vcf.gz | bgzip -c > test.annotated.vcf.gz
see: https://github.com/brentp/vcfanno
vcfanno.go:115: found 1 sources from 1 files vcfanno.go:143: using 2 worker threads to decompress query file api.go:804: WARNING: using op 'self' when with Number='1' for 'popmax_af' from 'gnomad_sv.vcf.gz' can result in out-of-order values when the query is multi-allelic api.go:805: : this is not an issue if the query has been decomposed. bix.go:450: no end: gnomad_sv.vcf.gz 1 9999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 14499 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 20649 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 20999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 39999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 40949 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 50940 N bix.go:450: no end: gnomad_sv.vcf.gz 1 62398 Nbix.go:450: no end: gnomad_sv.vcf.gz 1 66349 Nbix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 118398 N bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 136999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 141474 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 149999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 151999 N
bix.go:450: no end: gnomad_sv.vcf.gz 1 156999 N
vcfanno.go:241: annotated 1 variants in 0.00 seconds (545.9 / second)