Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 149 forks source link

False warning messages with vep 111 when using the range input format #1617

Closed ju-mu closed 4 months ago

ju-mu commented 4 months ago

Describe the issue

When using the range input format, a false warning about each input variant being skipped is printed out

Additional information

I have a pipeline running vep as apptainer/singularity container. I am using the REST style input format, as it is the only format without the need to specify reference alleles and works offline. With up to v110.1, the following worked flawlessly for many million variants:

echo 'chrY:59030922-59030922:G' | /usr/bin/apptainer exec -B /x:/x /y/tools/ensembl-vep_110.1/vep.sif vep -a GRCh37 --refseq --format region --offline  --fa /x/data/tools/vep/111/cachedir/homo_sapiens_refseq/111_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz --no_stats --cache --dir_cache /x/data/tools/vep/110/cachedir

The only minor inconvenience are the three unncessary log print outs upon execution:

INFO:    underlay of /etc/localtime required more than 50 (97) bind mounts
Smartmatch is experimental at /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm line 472.
2024-02-21 00:36:07 - INFO: BAM-edited cache detected, enabling --use_transcript_ref; use --use_given_ref to override this

and thankfully, the Smartmatch notification is gone with v111

System

Full VEP command line

With v111 however:

echo 'chrY:59030922-59030922:G' | /usr/bin/apptainer exec -B /x:/x /y/tools/ensembl-vep_111/vep.sif vep -a GRCh37 --refseq --format region --offline  --fa /x/data/tools/vep/111/cachedir/homo_sapiens_refseq/111_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz --no_stats --cache --dir_cache /x/data/tools/vep/111/cachedir/ 

Full error message

I am now getting a warning for every variant:

WARNING: line 1 skipped (chrY:59030922-59030922:G): G type is not supported

The warning message is obviously incorrect, as the output is the same as for v110.1 i.e. no variant has been skipped:

> cat variant_effect_output.txt
{"allele_string":"A/G","input":"chrY:59030922-59030922:G","strand":1,"assembly_name":"GRCh37","intergenic_consequences":[{"variant_allele":"G","consequence_terms":["intergenic_variant"],"impact":"MODIFIER"}],"start":59030922,"end":59030922,"seq_region_name":"chrY","most_severe_consequence":"intergenic_variant","id":"chrY:59030922-59030922:G"}
> cat variant_effect_output.txt_warnings.txt 
WARNING: line 1 skipped (chrY:59030922-59030922:G): G type is not supported

The warning is not shown when I use the ensembl format or a vcf as input:

echo 'chrY       59030922        59030922        A/G     +' | /usr/bin/apptainer exec -B /x:/x /y/tools/ensembl-vep_111/vep.sif vep -a GRCh37 --refseq  --offline  --fa /x/data/tools/vep/111/cachedir/homo_sapiens_refseq/111_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz --no_stats --cache --dir_cache /x/data/tools/vep/111/cachedir/ --format ensembl
nuno-agostinho commented 4 months ago

Hey @ju-mu,

Thanks for reporting this issue. I can indeed reproduce the problem you mentioned and I will try to check on how to fix it.

Kind regards, Nuno

nuno-agostinho commented 4 months ago

Hi @ju-mu,

I just opened PR https://github.com/Ensembl/ensembl-vep/pull/1618 to fix this bug. Although the warnings are inaccurate (and annoying), the output is still correct, so you can simply ignore the warnings for now.

I will keep you updated on when we merge the bug fix to our code.

Best regards, Nuno

ju-mu commented 4 months ago

Great thanks!!

nuno-agostinho commented 4 months ago

Hey @ju-mu,

The bug fix was now merged to our code and the warnings will not appear anymore in the next release of VEP (version 112).

I will close this issue now. Feel free to open a new issue in case you have other problems or feature requests. Thanks!

Cheers, Nuno