brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
356 stars 56 forks source link

Annotating Breakend Points (<BND>) #111

Open ajaarma opened 5 years ago

ajaarma commented 5 years ago

Hi, I am trying to annotate breakend points or BND structural variants called by illumina (Manta) but it seems it cannot annotate it because it doesnot recognize the ALT-id tags for BND types. The files are attached vcf_error.zip. The contents of the zipped file:

test.vcf: List of example query variants gnomad_test.bed.gz: List of example variants with their coordinates and ID that can be used to annotate the query vcf file vcfanno_bed.conf.toml: Configuration file

In the attached test.vcf file there are two variants : Variant-1: 1 261425 MantaBND:58922:1:10:0:0:0:1 A [chr4:190113797[GA 292 PASS SVTYPE=BND;MATEID=MantaBND:58922:1:10:0:0:0:0;SVINSLEN=1;SVINSSEQ=G;BND_DEPTH=106;MATE_BND_DEPTH=42;AC=1;AN=2;CSQT=1|AP006222.1|ENST00000441866.2|transcript_variant GT:FT:GQ:PL:PR:SR 0/1:PASS:292:342,0,999:37,1:65,19

and Variant-2 1 261425 MantaBND:58922:1:10:0:0:0:1 A 292 PASS END=261426;SVTYPE=BND;MATEID=MantaBND:58922:1:10:0:0:0:0;SVINSLEN=1;SVINSSEQ=G;BND_DEPTH=106;MATE_BND_DEPTH=42;AC=1;AN=2;CSQT=1|AP006222.1|ENST00000441866.2|transcript_variant GT:FT:GQ:PL:PR:SR 0/1:PASS:292:342,0,999:37,1:65,19

These two variants represent the Breakend points or BND type events. The variant-1 is the true variant without any modification and variant-2 is same as variant-1 but edited with ALT-ID changed to (can also be or etc) and endpoint tag END=261426 was added to this line.

I am trying to annotate with gnomad_test.bed.gz file (attached here) that has exactly same coordinates as this variant: 1 261425 261426 LP000Test

The configuration file is also attached: vcfanno_bed.conf.toml

I used the command as: vcfanno -p 4 -ends -permissive-overlap vcfanno_bed.conf.toml test.vcf

The resulting annotation for Variant-1: -- Not annotated with LP000Test

Whereas for variant-2: -- gets annotated with LP000Test.

It seems the problem is that vcfanno cannot recognize or find the ALT-id and END point of the BND variant type.

Is there a way it can be fixed in vcfanno. This will help a lot when I am trying to compute internal overlap with our large cohort (n>800 samples). Currently, many of these BND types are getting missed out because of it and affects overall interpretation.

Thanks for helping it out. vcfanno_error.zip

brentp commented 5 years ago

I haven't looked at the data yet, but just to clarify:

1 261425 261426 LP000Test

in BED format will not overlap:

1 261425 MantaBND:58922:1:10:0:0:0:1 A [chr4:190113797[GA 292 PASS 

because the VCF is 1-based and the BED is 0-based. Does that resolve your concerns?

ajaarma commented 5 years ago

Hi Brentp,

Thanks for the response and I agree with your argument. Hence, for the same reason I converted my BED to 1-based whose coordinate is as shown 1 261425 261426 LP000Test with the logic as described here: https://www.biostars.org/p/84686/ and repharsing the same as: if (type=SNV){start=start+1; end=end;} if (type=DEL){start=start+1; end=end;} if (type=INS){start=start; end=end+1;} I can relate the BND as Insertion event.

But still it doesnot get annotated with all the BND events with same coordinates as in our cohort. It works when I edit and impute the ALT-ID as <INS:[chr4:190113797[GA> and adding END=261426 then I get all the BNDs with exactly same coordinate as present in my cohort.

This small makeshift edit works for me but if possible this can be fixed some thing in vcfanno code?

brentp commented 5 years ago

can you post a 1 line vcf (with header) and a 1 line bed that demonstrate the problem?