brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

can annotate region info properly extend for insertion? #149

Closed liserjrqlxue closed 2 years ago

liserjrqlxue commented 2 years ago

If you have encountered an error, please include:

============================================= vcfanno version 0.3.2 [built with go1.14.4]

see: https://github.com/brentp/vcfanno

vcfanno.go:115: found 1 sources from 1 files vcfanno.go:156: falling back to non-bgzip vcfanno.go:248: annotated 3 variants in 0.00 seconds (1962.4 / second)

annotated vcf result:

1 874778 1164669 G GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA . . ALLELEID=1153717;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.10:g.874817_874864dup;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=Duplication;CLNVCSO=SO:1000035;CLNVI=Invitae:1613992;GENEINFO=SAMD11:148398;MC=SO:0001575|splice_donor_variant;ORIGIN=1;repeat=trf-1 1 874778 1108252 G GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA . . ALLELEID=1089348;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.10:g.874817CCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATC[3];CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Likely_benign;CLNVC=Microsatellite;CLNVCSO=SO:0000289;CLNVI=Invitae:3807250;GENEINFO=SAMD11:148398;MC=SO:0001575|splice_donor_variant;ORIGIN=1;repeat=trf-1 1 874778 769497 GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA G . . ALLELEID=777163;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.10:g.874817_874864del;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=Deletion;CLNVCSO=SO:0000159;CLNVI=Invitae:1680099;GENEINFO=SAMD11:148398;MC=SO:0001575|splice_donor_variant;ORIGIN=1;RS=568340123;repeat=trf-1,trf;left_repeat=trf-1;right_repeat=trf-1,trf


For convenient, I use `0-base` to describe start coordinates.
I want annoate **`trf`** tag of region `1 874778 874888` to the insertion (duplication) like `1:874778G>GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCA` but failed.
I guess positon of this variant from `vcf` is treated as `1 874777 874778` or `1 874778 874778` without overlap with  `1 874778 874888`.
`-ends` and `-permissive-overlap` has no effect.
Extend region to `1 874777 874888` can annotate insertion of this position (`trf-1` as example), but with cons that extend annotate variants overlap with `1 874777 874778`.
Can we have a more appropriate way?
brentp commented 2 years ago

Hi, thanks for the complete description. Yes, the VCF insertion is treated as (in BED 0-based half-open):

1 874777 874778

and the bed interval is:

1    874778  874888 

these do not overlap! So no parameter change will allow these to overlap. You could, instead, expand your bed regions by $n bases.