Open jperales opened 4 days ago
Hi @jperales,
Thanks for reporting this case. What do you think it would be the expected consequences for this example?
If you have more cases like so, please send us so we can take a look at them and see how they behave.
Cheers, Nuno
Regarding the consequence of the example above, indeed it looks the expected consequence would be a frameshift_variant, stop_gained
- as it was correctly predicted in previous versions (tested in VEP 104 & 111):
The reference CCDS sequence data for that transcript would be (source: https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS46852 , 2847 nt , 948 aa):
# Reference. 1st nucleotides, 2nd translation into aminoacids
atg aac ttg gga gat ggt tta aag ctt [...]
M N L G D G L K L [...]
If we insert the variant in the transcript, the sequence changes as follows. Please note that it leads to a frameshift and stop gained codon TAA (denoted as '-'):
# Variant 'chr3_56591279_-/GGGGTAAGCA'
atg aac ttg ggg taa gca ggg aga tgg ttt aaa gct t[...]
M N L G - A G R W F K A[...]
Please find more examples following the pattern:
Variant (VCF format) | Consequences VEP 112 | Consequences VEP 104 |
---|---|---|
3 56591278 . T TGGGGTAAGCA . . . | inframe_insertion,stop_retained_variant | frameshift_variant,stop_gained |
6 30558477 . G GA . . . | inframe_insertion,stop_retained_variant | frameshift_variant,stop_retained_variant |
10 116931101 . C CTT . . . | inframe_insertion,stop_retained_variant | frameshift_variant |
10 126673560 . G GA . . . | inframe_insertion,stop_retained_variant | frameshift_variant,stop_retained_variant |
16 31770696 . G GA . . . | inframe_insertion,stop_retained_variant | frameshift_variant,stop_retained_variant |
19 52888074 . G GATCATGAGGTCAGGAGATCGAGACCATCCTGGCTAACAAGGTGAAACCC . . . | inframe_insertion,stop_retained_variant | frameshift_variant,stop_gained |
Thank you very much for the efforts on this and your great work! Best, Javier
Hi Javier,
Thank you for sending those examples. :)
I will go through them with my team and see how we can improve VEP based on them.
Cheers, Nuno
I noticed that VEP 112 predicts an opposite consequence as compared to previous versions for certain insertions (rare cases). It predicts
inframe_insertion,stop_retained_variant
in cases whereframeshift_variant,stop_gained
was predicted before. Notably, these affect protein coding transcripts on exons, and the insertion length is not divisible by 3, so I would expect the frameshift. Moreover, I doubt whetherstop_retained_variant
makes sense in the region as I have seen this happening near splicing donor sites of first exons of protein coding transcripts, hence I would not expect a stop codon in the region. See below for 1 example case. Thank you!Example case:
Variant:
3:56591278-56591278 T>TGGGGTAAGCA
. It would be a 10-bp insertion on CCDC66 gene. Let's focus on the canonical transcriptENST00000394672
. Then this variant would affect the last part of the 1st exon, almost at the splicing site.VEP command line for VEP 111 & its output
VEP command line for VEP 112 & its output
System
Full error message
None
Data files (if applicable)
They include: