Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
453 stars 151 forks source link

`substr outside of string` in VariationEffect.pm line 1329 #1764

Open bartgrantham opened 2 weeks ago

bartgrantham commented 2 weeks ago

Describe the issue

I am getting the following error and I've narrowed it down to a single line VCF:

substr outside of string at /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/Utils/VariationEffect.pm line 1329, <$fh> li
ne 20970425.

System

I'm using the official VEP docker image id 607ee83f9536 (Ubuntu 22.04.4), containing the following versions:

  ensembl              : 112.7104005
  ensembl-funcgen      : 112.be19ffa
  ensembl-io           : 112.2851b6f
  ensembl-variation    : 112.4113356
  ensembl-vep          : 112.0

Full VEP command line

I was able to recreate from a completely clean install with the following on Debian 12:

docker pull ensemblorg/ensembl-vep:latest
docker run --rm -it ensemblorg/ensembl-vep bash

## then inside the container, with the tmp.vcf attached below
perl /opt/vep/src/ensembl-vep/INSTALL.pl -a cf -s gallus_gallus_merged

vep -i tmp.vcf -o tmp.vep.vcf --offline --species gallus_gallus_merged --everything --vcf --distance 0 --pick

Full error message

substr outside of string at /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/Utils/VariationEffect.pm line 1329, <$fh> li
ne 20970425.
Died in forked process 70938

Data files (if applicable)

This single-line VCF triggers the bug, it was narrowed down from a much (much) larger VCF. The original had the usual headers one might expect, they are not needed to trigger the error.

tmp.vcf.gz

dglemos commented 2 weeks ago

Hi @bartgrantham, Thanks for explaining the issue so clearly, it really helps in understanding the problem. I've been able to reproduce the issue, and we're working on a fix. I'll let you know when we have updates.

dglemos commented 1 day ago

I just wanted to let you know that this issue is specific to one of the RefSeq transcripts overlaping your variant. For now, a workaround is to run vep with only Ensembl transcripts.

bartgrantham commented 1 day ago

Very interesting. FWIW, once I excised that single position from our data I was able to annotate the remaining 50M+ positions.

Out of curiosity, is it known what exactly it is about the RefSeq transcript that triggers this bug for this one position? It's surprising that it was a single position out of tens of millions.

dglemos commented 17 hours ago

For the transcript XM_040697338, the peptide sequence calculated here is incomplete. This causes a problem for this variant located at the end of the translation sequence.