Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
456 stars 152 forks source link

Downstream changes for get_end fix in VCF parser #1773

Closed nakib103 closed 2 weeks ago

nakib103 commented 1 month ago

Depends on https://github.com/Ensembl/ensembl-io/pull/171

Skipping variant:

We were skipping SV deletion type variant if start >= end. The case start = end can be a valid case, for example, when SVLEN=1. Fixed that check to see if start > end or if there is no SVLEN or END information.

Unit test fix:

AnnotationSource_File_VCF.t

The type is exact and the custom line is as follows -

21  25585735    del2    TG  T   67  SEGDUP;RF

So it deletes the G at 25585736 position. Correct way to represent it in SVLEN is with POS=25585736, REF=T (always use the base before polymorphism happens) and SVELN=1 (the difference between ref and alt sequence).

Parser_VCF.t

The BND are case affected by the ensembl-io change because it does not have any SVLEN and END. The end here we will be now end=start. Before it was getting end=start+ref length-1. END position are not exactly clear for BND but the former ones seems more clear.