Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
Apache License 2.0
437 stars 150 forks source link

Synonyms file does not work in the offline mode #1705

Open XinmengLiao opened 1 week ago

XinmengLiao commented 1 week ago

Describe the issue

Hi. I am currently using VEP v111.0 to annotate my vcf files. When I use --offline mode and provided the synonyms file, errors shows up and some of the chromosomes do not overlap any features.


Full VEP command line

vep --cache --dir_cache $VEP_CACHE \
--offline \
--fork 128 \
--format vcf \
--dir_plugins $VEP_plugins111/ \
-i Sample1_PASS.vcf.gz \
-o Sample1_vep_annotated.vcf.gz \
--force_overwrite \
--compress_output bgzip \
--assembly GRCh38 \
--symbol --vcf --check_existing --variant_class \
--sift b --polyphen b \
--synonyms $VEP_CACHE/homo_sapiens/111_GRCh38/chr_synonyms.txt \
--hgvs \
--fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz \
--canonical \
--af --af_gnomade --af_gnomadg --max_af \

Full error message

WARNING: Line 249593 skipped (chr10_GL383545v1_alt 3485 . C CT 9.03 PASS AC=...): Chromosome chr10_GL383545v1_alt not found in annotation sources or synonyms; chromosome chr10_GL383545v1_alt does not overlap any features WARNING: Line 249594 skipped (chr10_GL383545v1_alt 5295 . G GA 11.56 PASS AC...): Chromosome chr10_GL383545v1_alt not found in annotation sources or synonyms; chromosome chr10_GL383545v1_alt does not overlap any features WARNING: Line 249710 skipped (chr10_KI270824v1_alt 73879 . T C 47.19 PASS AC...): Chromosome chr10_KI270824v1_alt not found in annotation sources or synonyms; chromosome chr10_KI270824v1_alt does not overlap any features WARNING: Line 249711 skipped (chr10_KI270824v1_alt 74581 . CGCGGCTTTTTGCACCC...): Chromosome chr10_KI270824v1_alt not found in annotation sources or synonyms; chromosome chr10_KI270824v1_alt does not overlap any features

Additional description

All the chromosome in synonyms form can not be correctly annotated. But all these synonyms can be found in the 'chr_synonyms.txt' as following:

image image

XinmengLiao commented 1 week ago

Additionally, when I turn off the --offline mode, the connection to database failed as following:

MSG: Could not connect to database homo_sapiens_core_110_38 as user anonymous using [DBI:mysql:database=homo_sapiens_core_110_38;host=ensembldb.ensembl.org;port=3306] as a locator: DBI connect('database=homo_sapiens_core_110_38;host=ensembldb.ensembl.org;port=3306','anonymous',...) failed: Can't connect to MySQL server on 'ensembldb.ensembl.org' (110 "Connection timed out") at /sw/bioinfo/vep/110.1/rackham/Bio/EnsEMBL/DBSQL/DBConnection.pm line 260.

nakib103 commented 1 week ago

Hello @XinmengLiao,

Thanks for your reply!

I can re-produce the issue with the --offline mode. The variants are not getting annotated because we are missing some folders required for those regions in the cache. It is happening since e110, I will further investigate the cause on why it is happening.

For the database issue can you make sure that you have the required 3306 port open and it is not a firewall issue.

Best regards, Nakib