Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

Missing genes on chrMT when using --refseq #1695

Closed ju-mu closed 2 weeks ago

ju-mu commented 3 weeks ago

As already mentioned in #1659 seven genes ND1-6 + ND4L are missing whenever the RefSeq annotation is used.

This affects the web interface and the cli, hg19 and hg38 and at least the last few versions of vep 112 ( tested until ~109) using the database or offline cache.

It can be verified by any known variant described within this genes such as: rs193302971

The genes are found using ensembl or merged annotations.

Looking at the cache directory in homo_sapiens_refseq/112_GRCh37/MT/1-1000000.gz the genes seem to be present.

But I really wonder why they are not reported?

Thank you!

nuno-agostinho commented 2 weeks ago

Hi @ju-mu,

Sorry for the delay in replying and thank you for reporting this issue.

You are right that VEP doesn't currently report back transcripts on genes ND1-6 and ND4L for RefSeq data, even though they are available on cache. I already opened PR https://github.com/Ensembl/ensembl-vep/pull/1701 to fix this bug in the next version of VEP.

In the meantime, please use the hidden flag --all_refseq in VEP to return all RefSeq transcripts in our cache, including tissue-specific transcripts starting with compmerge. If you are not interested in those compmerge transcripts, please run filter_vep on the VEP results using a command similar to:

filter_vep -i vep_output.txt --filter "Feature not matches compmerge" -o vep_filtered.txt

Hope this helps and sorry for the inconvenience.

Kind regards, Nuno

ju-mu commented 2 weeks ago

Switching --all_refseq on for chrMT fixed it for us. I haven't seen any compmerge transcripts in the output so far.

Thanks a lot!

nuno-agostinho commented 6 days ago

Hi @ju-mu,

Hope you are having a great day!

Just to update you that this bug is now fixed in the next version of VEP: all the expected mitochondria RefSeq transcripts will be returned in VEP 113 without the need to use --all_refseq.

Thanks for reporting this issue.

Cheers, Nuno