brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
364 stars 56 forks source link

Using vcfanno to annotate SV calls with GNOMAD SV frequencies #106

Open naumenko-sa opened 5 years ago

naumenko-sa commented 5 years ago

Hello, Brent!

Thanks for the very useful vcfanno tool!

I'd like to start using it for annotation of structural variants (SV) in the same way how it works for small variant vcf files.

Recently GNOMAD released its SV frequencies, and I gave it a try:

see: https://github.com/brentp/vcfanno

vcfanno.go:115: found 1 sources from 1 files vcfanno.go:143: using 2 worker threads to decompress query file api.go:804: WARNING: using op 'self' when with Number='1' for 'popmax_af' from 'gnomad_sv.vcf.gz' can result in out-of-order values when the query is multi-allelic api.go:805: : this is not an issue if the query has been decomposed. bix.go:450: no end: gnomad_sv.vcf.gz 1 9999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 14499 N bix.go:450: no end: gnomad_sv.vcf.gz 1 20649 N bix.go:450: no end: gnomad_sv.vcf.gz 1 20999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 39999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 40949 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 45999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 50940 N bix.go:450: no end: gnomad_sv.vcf.gz 1 62398 N bix.go:450: no end: gnomad_sv.vcf.gz 1 66349 N bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 93999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 118398 N bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N bix.go:450: no end: gnomad_sv.vcf.gz 1 123349 N bix.go:450: no end: gnomad_sv.vcf.gz 1 136999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 141474 N bix.go:450: no end: gnomad_sv.vcf.gz 1 149999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 151999 N bix.go:450: no end: gnomad_sv.vcf.gz 1 156999 N vcfanno.go:241: annotated 1 variants in 0.00 seconds (545.9 / second)


There is a result file: test.annotated.vcf.gz, but it does not contain the gnomad_sv_popmax_af annotation.

What would be the algorithm of SV matching in vcfanno? For small variants it is just chr:pos:ref:alt, but for SVs it is different: min 50% reciprocal overlap between features (END-START) or similar.

Sergey
brentp commented 5 years ago

I have pushed a fix for this. Would you try the attached binary? vcfanno_linux64_fix.gz

brentp commented 5 years ago

you'll need to set

fields=["POPMAX_AF"]

and then you'll probably want -permissive-overlap or it will only get exact matches.

naumenko-sa commented 5 years ago

Thanks @brentp !

This binary works. However, even with -permissive-overlap option on, annotation happens only when POS in the query is the same as POS in the GNOMAD. When I tried to annotate an SV [158,000-166,000] with gnomad [157,000-166,000] it would not annotate.

Sergey

brentp commented 5 years ago

don't use self.

On Mon, Mar 18, 2019 at 1:59 PM Sergey Naumenko notifications@github.com wrote:

Thanks @brentp https://github.com/brentp !

This binary works. However, even with -permissive-overlap option on, annotation happens only when POS in the query is the same as POS in the GNOMAD. When I tried to annotate an SV [158,000-166,000] with gnomad [157,000-166,000] it would not annotate.

Sergey

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/brentp/vcfanno/issues/106#issuecomment-474077554, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAGy1ClEmgvHyISdD_bXMWXJqvwidZRks5vX_A7gaJpZM4b6ZzQ .

naumenko-sa commented 5 years ago

Thanks @brentp!

I've got it - when working with intervals, it is better to annotate a vcf file vs bed file rather than vs vcf. Still I have some minor questions and suggestions, if you don't mind.

My new config

[[annotation]]
file="gnomad_sv.bed.gz"
columns=[39]
names=["gnomad_sv_popmax_af"]
ops=["self"]

Probably, before using gnomad_sv for annotation we have to 'normalize' gnomad_SV database like we did for small variants. However, SVs are large, and there are rare deletions overlapping with frequent duplications in the population. What is your take on this problem? Would you suggest to split gnomad bed file into several files according to types (DEL, DUP) and annotate separately?

A test of sensitivity of vcfanno: I picked up a deletion in GNOMAD which did not overlap with anything: 1 50941 51053 gnomAD_v2_DEL_1_3

Do you think that the last overlap is too permissive and something like % required reciprocal overlap could be added here in vcfanno like in bedtools? I'd happy to contribute here, just point me out to the right place. Another suggestion might be a flag to use an additional INFO field when matching (i.e. SVTYPE in vcf and a certain column in BED).

Or maybe it is possible to allow users to custom calculation on fields before annotation happens, i.e. to define what is matching with lua as it happens with post-annotation?

Thanks! Sergey

brentp commented 5 years ago

Hi, if I understand correctly, you are describing what appear to be bugs in vcfanno--variants that should be getting annotated, but are not. Is that correct? Are you sure both files are sorted and that the file you are annotating with is indexed correctly? Can you share a small query and annotation file and conf where it does not annotate as expected?

naumenko-sa commented 5 years ago

Hi @brentp !

I see two issues. In both annotation happens. One issue is incorrect annotation, another is too permissive annotation (better to have no annotation there). To reproduce both issues you may use these files:

Works the same way with and without -permissive-overlap, and vcfanno 0.3.1 and 0.3.2.

Sergey

ajaarma commented 5 years ago

I guess the problem is related to finding reciprocal overlap and currently most of the existing methods dont deal with it. Has this been resolved yet in vcfanno? Eagerly waiting for it.

snashraf commented 4 years ago

I am having similar issue. Did someone has an update on this?

brentp commented 4 years ago

hi, I won't have time to work on this. I think the machinery is in place inside vcfanno for the annotation which can be followed up with some post-processing to require a certain amount of overlap.