Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

VEP112 predicts "inframe_insertion, stop_retained_variant" in cases where previously was predicted as "frameshift_variant, stop_gained" #1710

Open jperales opened 4 days ago

jperales commented 4 days ago

I noticed that VEP 112 predicts an opposite consequence as compared to previous versions for certain insertions (rare cases). It predicts inframe_insertion,stop_retained_variant in cases where frameshift_variant,stop_gained was predicted before. Notably, these affect protein coding transcripts on exons, and the insertion length is not divisible by 3, so I would expect the frameshift. Moreover, I doubt whether stop_retained_variant makes sense in the region as I have seen this happening near splicing donor sites of first exons of protein coding transcripts, hence I would not expect a stop codon in the region. See below for 1 example case. Thank you!

Example case:

Variant: 3:56591278-56591278 T>TGGGGTAAGCA. It would be a 10-bp insertion on CCDC66 gene. Let's focus on the canonical transcript ENST00000394672. Then this variant would affect the last part of the 1st exon, almost at the splicing site.

VEP command line for VEP 111 & its output

$vep --no_stats -id "3 56591278 . T TGGGGTAAGCA . . ." -o "STDOUT" --tab --assembly GRCh37 --symbol --numbers --cache --offline | grep 'ENST00000394672'
3_56591279_-/GGGGTAAGCA 3:56591278-56591279     GGGGTAAGCA      ENSG00000180376 ENST00000394672 Transcript      stop_gained,frameshift_variant  78-79   8-93L/LG*AX ttg/ttGGGGTAAGCAg       -       HIGH    -       1       -       CCDC66  HGNC    27709   1/18    -

VEP command line for VEP 112 & its output

$vep --no_stats -id "3 56591278 . T TGGGGTAAGCA . . ." -o "STDOUT" --tab --assembly GRCh37 --symbol --numbers --cache --offline | grep 'ENST00000394672'
3_56591279_-/GGGGTAAGCA 3:56591278-56591279     GGGGTAAGCA      ENSG00000180376 ENST00000394672 Transcript      inframe_insertion,stop_retained_variant 78-79       8-9     3       L/LG*AX ttg/ttGGGGTAAGCAg       -       MODERATE        -       1       -       CCDC66  HGNC    27709   1/18    -

System

Full error message

None

Data files (if applicable)

They include:

nuno-agostinho commented 4 days ago

Hi @jperales,

Thanks for reporting this case. What do you think it would be the expected consequences for this example?

If you have more cases like so, please send us so we can take a look at them and see how they behave.

Cheers, Nuno

jperales commented 4 days ago

Regarding the consequence of the example above, indeed it looks the expected consequence would be a frameshift_variant, stop_gained - as it was correctly predicted in previous versions (tested in VEP 104 & 111):

The reference CCDS sequence data for that transcript would be (source: https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS46852 , 2847 nt , 948 aa):

# Reference. 1st nucleotides, 2nd translation into aminoacids
atg aac ttg gga gat ggt tta aag ctt [...]
 M   N   L   G   D   G   L   K   L [...]

If we insert the variant in the transcript, the sequence changes as follows. Please note that it leads to a frameshift and stop gained codon TAA (denoted as '-'):

# Variant 'chr3_56591279_-/GGGGTAAGCA'
atg aac ttg ggg taa gca ggg aga tgg ttt aaa gct t[...]
 M   N   L   G   -   A   G   R   W   F   K   A[...]

Please find more examples following the pattern:

Variant (VCF format) Consequences VEP 112 Consequences VEP 104
3 56591278 . T TGGGGTAAGCA . . . inframe_insertion,stop_retained_variant frameshift_variant,stop_gained
6 30558477 . G GA . . . inframe_insertion,stop_retained_variant frameshift_variant,stop_retained_variant
10 116931101 . C CTT . . . inframe_insertion,stop_retained_variant frameshift_variant
10 126673560 . G GA . . . inframe_insertion,stop_retained_variant frameshift_variant,stop_retained_variant
16 31770696 . G GA . . . inframe_insertion,stop_retained_variant frameshift_variant,stop_retained_variant
19 52888074 . G GATCATGAGGTCAGGAGATCGAGACCATCCTGGCTAACAAGGTGAAACCC . . . inframe_insertion,stop_retained_variant frameshift_variant,stop_gained

Thank you very much for the efforts on this and your great work! Best, Javier

nuno-agostinho commented 4 days ago

Hi Javier,

Thank you for sending those examples. :)

I will go through them with my team and see how we can improve VEP based on them.

Cheers, Nuno