Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

Different cDNA length for several RefSeq transcript #1696

Closed barbarian1803 closed 1 week ago

barbarian1803 commented 3 weeks ago

Describe the issue

There are some differences between cDNA length for RefSeq transcript when using downloaded merged cache compared to the genbank. One of the transcript is NR_038327.2

Additional information

Running vep with --total_length parameter. For this variant: chr21 9068792 . T G . mapping_quality;non_homref_normal;no_reliable_supporting_read;weak_evidence DP=479;MQ=20.88;FractionInformativeReads=1.000;SoftClipRatio=0.00 GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 0/0:0.00:84,12:0.1250:39,5:45,7:96:.:. 0/1:11.61:222,67:0.2318:124,35:98,32:289:85,137,36,31:105,117,32,35

One of the refseq annotation is: 1|non_coding_transcript_exon_variant|MODIFIER|TEKT4P2|100132288|Transcript|NR_038327.2|transcribed_pseudogene|4/4||NR_038327.2:n.1177A>C||1177/1613|||||||-1||EntrezGene|HGNC:40046|||||RefSeq||T|T|OK|||||chr21:g.9068792T>G||||||||| It is shown that the variant position ofr the cDNA is 1177/1613. This means that the cDNA length for this transcript is 1613. If we check it from the genebank the length is 1617.

System

jamie-m-a commented 2 weeks ago

Hi @barbarian1803

Would you mind sharing the full vep command you ran to get these results - thanks!

barbarian1803 commented 1 week ago

Just realized the VEP for RefSeq accommodate the transcript correction and the diff is coming from that correction.