Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
452 stars 151 forks source link

MSG: The coordinate interval for display is of different length than for the reference allele #1775

Open suzyhh opened 3 hours ago

suzyhh commented 3 hours ago

Describe the issue

Hello, I'm testing v113 using the v113 docker obtained from docker hub and the v113 cache downloaded using the following command: docker run --network host -t -i -v /mnt/data1/software/vep/:/opt/vep/.vep ensemblorg/ensembl-vep:release_113.0 INSTALL.pl -a c -s homo_sapiens_merged -y GRCh38

I have successfully tested v113 with a VCF containing structural variants, but I encounter an error when running with a VCF containing SNVs/indels. After the error the process just hangs doing nothing until I kill it. The output files contains the VCF header but no variants.

System

Full VEP command line

vep --cache --offline \
    -i ~{vcf} \
    --dir_cache /opt/vep/.vep \
    --dir_plugins /opt/vep/.vep/Plugins/ \
    --vcf --compress_output bgzip \
    --merged \
    --fasta $refFasta \
    --assembly GRCh38 \
    --no_stats \
    --fork ~{fork} \
    --buffer_size ~{buffer} \
    --no_escape \
    --check_existing \
    --hgvs --hgvsg \
    --af --af_gnomadg \
    --protein --uniprot \
    --symbol \
    --numbers \
    --allele_number \
    --sift b --polyphen b \
    --pubmed \
    --show_ref_allele \
    --variant_class \
    --mane_select \
    --transcript_version \
    --flag_pick_allele_gene \
    --plugin MaxEntScan,/opt/vep/.vep/fordownload,NCSS,SWA \
    --plugin SpliceAI,snv=~{spliceaiSnv},indel=~{spliceaiIndel} \
    --plugin SpliceRegion \
    --plugin NearestExonJB,max_range=100 \
    --plugin REVEL,/opt/vep/.vep/new_tabbed_revel_grch38.tsv.gz \
    --plugin SpliceDistance \
    --plugin UTRAnnotator,file=/opt/vep/.vep/fordownload/uORF_5UTR_GRCh38_PUBLIC.txt \
    --plugin GnomadPli,file=~{gnomadv4Pli} \
    --plugin CADD,snv=~{caddSnv},indels=~{caddIndel} \
    --plugin AlphaMissense,file=~{alphaMissense} \
    --custom file=${hgmdZip},short_name=HGMD,format=vcf,type=exact,coords=0,fields=CLASS%PHEN \
    --custom file=~{clinVarVcf},short_name=ClinVar,format=vcf,type=exact,coords=0,fields=CLNSIG%CLNREVSTAT%CLNDN \
    --custom file=~{gnomadv4Vcf},short_name=gnomad4,format=vcf,type=exact,coords=0,fields=homCount%hetCount%hemiCount%gnomadFilter \
    --force_overwrite \
    -o ~{outName}_snvs.vep.vcf.gz

Full error message

STDERR 
STDERR -------------------- EXCEPTION --------------------
STDERR MSG: The coordinate interval for display is of different length than for the reference allele
STDERR STACK Bio::EnsEMBL::Variation::Utils::Sequence::hgvs_variant_notation /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/Utils/Sequence.pm:511
STDERR STACK Bio::EnsEMBL::Variation::VariationFeature::hgvs_genomic /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/VariationFeature.pm:2013
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::VariationFeatureOverlapAllele_to_output_hash /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:1314
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::TranscriptVariationAllele_to_output_hash /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:1606
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::get_all_VariationFeatureOverlapAllele_output_hashes /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:393
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::get_all_output_hashes_by_VariationFeature /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:347
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::VCF::get_all_lines_by_InputBuffer /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory/VCF.pm:316
STDERR STACK Bio::EnsEMBL::VEP::Runner::_buffer_to_output /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:422
STDERR STACK (eval) /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:671
STDERR STACK Bio::EnsEMBL::VEP::Runner::_forked_process /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:665
STDERR STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:496
STDERR STACK Bio::EnsEMBL::VEP::Runner::next_output_line /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:367
STDERR STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:208
STDERR STACK toplevel /opt/vep/src/ensembl-vep/vep:46
STDERR Date (localtime)    = Mon Oct 21 13:54:46 2024
STDERR Ensembl API version = 113
STDERR ---------------------------------------------------
STDERR Died in forked process 53

I have tested my command using v112 and it completes successfully with no errors/warnings so it does seem to be a v113 problem. Running locally I have found that the problem occurs when using the --hgvsg flag. If I run using only --hgvs I do not get the above error, but I do get several warnings, here is a small snippet:

WARNING: 12 : WARNING: Transcript-assembly mismatch in rs6650119
WARNING: Transcript-assembly mismatch in rs6650119
WARNING: 13 : WARNING: Transcript-assembly mismatch in rs2275166
WARNING: Transcript-assembly mismatch in rs2275166
WARNING: 14 : WARNING: Transcript-assembly mismatch in rs309472
WARNING: Transcript-assembly mismatch in rs309472
WARNING: Transcript-assembly mismatch in rs586178
WARNING: Transcript-assembly mismatch in rs586178
WARNING: 39 : WARNING: Transcript-assembly mismatch in rs193922695
WARNING: Transcript-assembly mismatch in rs193922695
WARNING: 52 : WARNING: Transcript-assembly mismatch in rs7601549
WARNING: Transcript-assembly mismatch in rs7601549

I do not receive any errors or warning when I run v113 without both --hgvs and --hgvsg. This unfortunately renders v113 unusable for us as we rely on the hgvs annotations.

Many thanks!

likhitha-surapaneni commented 3 hours ago

Hi @suzyhh , thank you for reporting the issue. Can you kindly provide us the input file/test input used?

suzyhh commented 2 hours ago

@likhitha-surapaneni

I have attached the input file (bgzipped for compatibility with github) that generates the error - ta! TwEx_ProbandF-TwEx_MotherF-TwEx_FatherF_gnomad_filtered_mt.vcf.gz