Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

Synonyms file does not work in the offline mode #1705

Open XinmengLiao opened 1 week ago

XinmengLiao commented 1 week ago

Describe the issue

Hi. I am currently using VEP v111.0 to annotate my vcf files. When I use --offline mode and provided the synonyms file, errors shows up and some of the chromosomes do not overlap any features.

System

Full VEP command line

vep --cache --dir_cache $VEP_CACHE \
--offline \
--fork 128 \
--format vcf \
--dir_plugins $VEP_plugins111/ \
-i Sample1_PASS.vcf.gz \
-o Sample1_vep_annotated.vcf.gz \
--force_overwrite \
--compress_output bgzip \
--assembly GRCh38 \
--symbol --vcf --check_existing --variant_class \
--sift b --polyphen b \
--synonyms $VEP_CACHE/homo_sapiens/111_GRCh38/chr_synonyms.txt \
--hgvs \
--fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz \
--canonical \
--af --af_gnomade --af_gnomadg --max_af \
--custom clinvar_20240611_PLPC.vcf.gz,ClinVar,vcf,exact,0,ID,CLNSIG,CLNDN,CLNHGVS,CLNSIGINCL,CLNVC,GENEINFO,CLNDISDB,CLNSIGCONF,CLNREVSTAT,CLNDNINCL,CLNREVSTAT 

Full error message

WARNING: Line 249593 skipped (chr10_GL383545v1_alt 3485 . C CT 9.03 PASS AC=...): Chromosome chr10_GL383545v1_alt not found in annotation sources or synonyms; chromosome chr10_GL383545v1_alt does not overlap any features WARNING: Line 249594 skipped (chr10_GL383545v1_alt 5295 . G GA 11.56 PASS AC...): Chromosome chr10_GL383545v1_alt not found in annotation sources or synonyms; chromosome chr10_GL383545v1_alt does not overlap any features WARNING: Line 249710 skipped (chr10_KI270824v1_alt 73879 . T C 47.19 PASS AC...): Chromosome chr10_KI270824v1_alt not found in annotation sources or synonyms; chromosome chr10_KI270824v1_alt does not overlap any features WARNING: Line 249711 skipped (chr10_KI270824v1_alt 74581 . CGCGGCTTTTTGCACCC...): Chromosome chr10_KI270824v1_alt not found in annotation sources or synonyms; chromosome chr10_KI270824v1_alt does not overlap any features

Additional description

All the chromosome in synonyms form can not be correctly annotated. But all these synonyms can be found in the 'chr_synonyms.txt' as following:

image image

XinmengLiao commented 1 week ago

Additionally, when I turn off the --offline mode, the connection to database failed as following:

MSG: Could not connect to database homo_sapiens_core_110_38 as user anonymous using [DBI:mysql:database=homo_sapiens_core_110_38;host=ensembldb.ensembl.org;port=3306] as a locator: DBI connect('database=homo_sapiens_core_110_38;host=ensembldb.ensembl.org;port=3306','anonymous',...) failed: Can't connect to MySQL server on 'ensembldb.ensembl.org' (110 "Connection timed out") at /sw/bioinfo/vep/110.1/rackham/Bio/EnsEMBL/DBSQL/DBConnection.pm line 260.

nakib103 commented 1 week ago

Hello @XinmengLiao,

Thanks for your reply!

I can re-produce the issue with the --offline mode. The variants are not getting annotated because we are missing some folders required for those regions in the cache. It is happening since e110, I will further investigate the cause on why it is happening.

For the database issue can you make sure that you have the required 3306 port open and it is not a firewall issue.

Best regards, Nakib