Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
445 stars 151 forks source link

HGVS C dot not using right most aligned option #1741

Open namiller2015 opened 4 weeks ago

namiller2015 commented 4 weeks ago

Hello,

For the following variant in VCF format chr19 1219320 . CACGTATATG CCCG

This represents a variant that looks like this CACGTATATGGTG REF CCCG--------GTG ALT

We have a SNP A->C at the 2nd nucleotide and a deletion of TATAT and a deletion of a G. VCF formatting would have the left most G deleted. HGVS standards would have the right most G deleted

Online web VEP using refseq transcripts gives the following C Dot c.375-2_380delinsCC for transcript NM_000455.5

but HGVS should be right most aligned giving a cdot of c.375-2_381delinsCCG

I'm having difficulty understanding the correct way to resolve the inherit discrepancy between VCF standards and HGVS standards. Is this an issue with the VCF representation, the way VEP's HGVS annotation is working, or something else?

Thanks!

namiller2015 commented 3 weeks ago

Some additional detail.

Using online web VEP I entered 2 HGVS entries for just the deletion. NM_000455.5:c.375-380del AND NM_000455.5:c.376-381del

both had the HGVS C dot output as NM_000455.5:c.375-380del. but according to HGVS standards this should be 3 prime shifted and result in 376-381.

I don't understand why the 376-381del is getting reworked to 375-380.

VEP output results https://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=0I7ML6omyVC7jwHN-10322036

namiller2015 commented 3 weeks ago

I put this deletion into LUMC Mutalyzer 3's normalizer and got the expected 3 prime shifted C dot.

input: (NM_000455.5):c.375_380del https://mutalyzer.nl/

image

dglemos commented 3 weeks ago

Hi @namiller2015, Thank you for reporting this issue. We are still investigating the output, I'll let you know when we have any updates.

Best wishes, Diana

namiller2015 commented 3 weeks ago

Thanks!

Just adding some more details. Maybe this is a order of operations issue?

If you assess if the change is one indel or two different alterations AND THEN AFTER THAT three-prime you get delinsCC, because they are 2 (not less than 2) bp apart (Order operations A)

BUT

If you assess 3’ needs first AND THEN AFTER THAT assess if the alterations are three or more nucleotides apart you get SNV change at -2 and a 6 basepair del (376_381) (Order operations B)

This could explain why when I just put in the 6 base pair del into mutalyzer it 3-primes correctly, but when I put in the sequence it does order of operations A

The HGVS standards are also in the process of changing so it depends on which set of standards are currently being used.

HGVS_standards (1)

namiller2015 commented 1 week ago

any updates on this? Thanks