Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

CDS coordinate misalignment for some genes in VEP version 112 #1704

Open GSYongWu opened 1 week ago

GSYongWu commented 1 week ago

Describe the issue

I have discovered a new problem. The CDS coordinates for some genes are incorrect. For example, the mutation SRGAP2:NM_015326.5, c.85A>T(p.T29S) has been annotated as c.994A>T(p.T332S). I suspect it is a database issue. Genomic coordinates are 1,206516279,A,T

Additional information

Please fill in the following sections to help us find the source of your issue as quickly as possible.

System

nakib103 commented 1 week ago

Hello @GSYongWu,

Thanks for your query and sorry for the late reply.

From the HGVS notation you provided, I can infer that you are using GRCh37 assembly with refseq cache. The RefSeq transcript do not necessarily always match the reference assembly. For that reason when we provide VEP annotation we need alignment information and RefSeq (an external source to Ensembl) provide it for us. See here - https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq_bam

In e112 we have updated our cache with the new alignment file from NCBI, that is why you are seeing this change.

Best regards, Nakib

GSYongWu commented 5 days ago

However, the coordinates provided by VEP e112 do not match those in the UCSC Genome Browser, nor can they be aligned with literature and other databases. Is this appropriate?