Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
456 stars 152 forks source link

Incorrect Annotation for Frameshift Variant in VEP 113 #1796

Open GSYongWu opened 4 days ago

GSYongWu commented 4 days ago

Hi, I would like to report an issue with the variant annotation in VEP version 113. Specifically, variants that should be annotated as frameshift_variant are being incorrectly annotated as inframe_insertion.

Example Variant:

Variant: 17:58740533_A>AAAGCCCTGACTTTAAGGATACATGATTC
Current VEP 113 Annotation: inframe_insertion & stop_retained_variant
VEP 111 Annotation: stop_gained & frameshift_variant

HGVSp Results:

It appears that the results from VEP 111 are correct.

Could you please investigate this discrepancy? Correct annotation of such variants is crucial for downstream analyses. Thank you for your attention to this matter.

Best regards,

olaaustine commented 4 days ago

Hi @GSYongWu, Hope this meets you well? Please can you share more information so we can try to recreate the issue, such as what assembly and your VEP command ? Thank you Ola

GSYongWu commented 3 days ago

Hi, I used the genome version GRCh37. My command is: "/usr/bin/perl ensembl-vep-release-113.0/vep --offline --no_stats --buffer_size 10000 --fork 4 --ccds --uniprot --hgvs --symbol --shift_3prime 1 --numbers --canonical --protein --biotype --hgvsg --variant_class --total_length --force_overwrite --allele_number --no_escape --vcf --dir vepdb --fasta genome/hs37d5.fa --format vcf --input_file clincal.merge.vcf --output_file clincal.merge.out.vep.vcf --refseq --use_given_ref --no_check_variants_order"

olaaustine commented 3 days ago

Hi @GSYongWu, We have been able to identify the issue. The issue is with the --shift_3prime an improvement introduced to the code recently. As a workaround while we get this sorted, can you run the command without the --shift_3prime 1 to see if that fixes the problem. Let us know if there are still any issues. Thank you Ola.

GSYongWu commented 2 days ago

Hi, Ola Thank you very much for your reply.

I tried removing this parameter, and the issue with the reported mutation sites was indeed resolved. However, other sites were affected. For example, at the site 7:55248981_T>TCCAGGAAGC, the HGVSp should be p.A763_Y764insQEA, but after removing the --shift_3prime parameter, the HGVSp is empty. The consequence also changed from inframe_insertion to splice_region_variant.

Best regards,

olaaustine commented 2 days ago

Hi @GSYongWu, Hope you are well? Thank you for your response and for letting us know the workaround fixes a problem. About the variant mentioned above, looking at the HGVSc using Ensembl Transcript ENST00000275493.2:c.2284-3_2289dup, it affects a Splice site. This annotation is consistent across the different releases mentioned above without the --shift_3prime parameter. Let me know if this helps. Thank you Ola.

GSYongWu commented 1 day ago

Hi, Ola Thank you for your response. This is unrelated to the version; I am discussing what the correct result for this mutation should be. I think this is an incorrect result, because this mutation is a duplication (dup), which does not change the splice site but has altered the protein coding, resulting in a non-frameshift mutation. There should be an HGVSp, and the consequence should be inframe_insertion. There is a similar mutation on COSMIC. 7:55248980_C>CTCCAGGAAGCCT 企业微信截图_17322628226573 If removing the --shift_3prime parameter in different versions still does not yield the correct results, then I think this parameter is essential for me.

Best regards,

olaaustine commented 1 day ago

Hi @GSYongWu, I mentioned versions because, although the first variant described is a bug, the annotation for the variant 7:55248981_T>TCCAGGAAGC remains consistent across different versions To understand the way VEP handles shifting, please you can take a look at this documentation. Let us know if you have any more questions Thank you Ola