Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
456 stars 152 forks source link

Docker: Failed to instantiate plugin Phenotypes: ERROR: tabix failed #579

Closed thedam closed 5 years ago

thedam commented 5 years ago

The first run of new docker image (VEP97):

Species 'homo_sapiens' loaded from database 'homo_sapiens_core_97_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_cdna_97_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_otherfeatures_97_37'
Species 'homo_sapiens' loaded from database 'homo_sapiens_rnaseq_97_37'
homo_sapiens_variation_97_37 loaded
homo_sapiens_funcgen_97_37 loaded
No ancestral database found
No ontology database found
No taxonomy database found
No ensembl_metadata database found
No production database or adaptor found
### Phenotypes plugin: Generating GFF file /opt/vep/.vep/Plugins/Phenotypes.pm_homo_sapiens_97_GRCh37.gvf.gz from database
### Phenotypes plugin: This will take some time but it will only run once per species, assembly and release

-------------------- WARNING ----------------------
MSG: You are using the API without caching most recent features. Performance might be affected.
FILE: EnsEMBL/DBSQL/BaseFeatureAdaptor.pm LINE: 86
CALLED BY: Bio/EnsEMBL/Registry.pm  LINE: 1189
Date (localtime)    = Mon Aug 26 08:57:17 2019
Ensembl API version = 97
---------------------------------------------------
### Phenotypes plugin: Querying database
### Phenotypes plugin: Writing to file
### Phenotypes plugin: Sorting file with sort
### Phenotypes plugin: Indexing file with tabix
[E::get_intv] failed to parse TBX_GENERIC, was wrong -p [type] used?
The offending line was: "Binary file (standard input) matches"
[E::hts_idx_push] unsorted positions
tbx_index_build failed: /opt/vep/.vep/Plugins/Phenotypes.pm_homo_sapiens_97_GRCh37.gvf.gz
WARNING: Failed to instantiate plugin Phenotypes: ERROR: tabix failed

2019-08-26 09:10:25 - INFO: BAM-edited cache detected, enabling --use_transcript_ref; use --use_given_ref to override this
thedam commented 5 years ago

same with VEP 96

Ensembl API version = 96

Phenotypes plugin: Querying database

Phenotypes plugin: Writing to file

Phenotypes plugin: Sorting file with sort

Phenotypes plugin: Indexing file with tabix

[E::get_intv] failed to parse TBX_GENERIC, was wrong -p [type] used? The offending line was: "Binary file (standard input) matches" [E::hts_idx_push] unsorted positions tbx_index_build failed: /opt/vep/.vep/Plugins/Phenotypes.pm_homo_sapiens_96_GRCh37.gvf.gz WARNING: Failed to instantiate plugin Phenotypes: ERROR: tabix failed

2019-08-26 10:04:11 - INFO: BAM-edited cache detected, enabling --use_transcript_ref; use --use_given_ref to override this WARNING: No input file format specified - detected vcf format

ens-lgil commented 5 years ago

Dear @thedam,

I am afraid that it's not linked to Docker.

About the first part, I suspect that there was some connection issues with the Ensembl databases server when you ran it.

About the indexing itself, it seems that the zgrep command used to sort the Phenotype GVF file found some binary characters (?), therefore tabix can't index the plugin's data file. I am investigating what are thoses characters and why they are in the GVF file.

Best regards, Laurent

ens-lgil commented 5 years ago

Dear @thedam,

Well, actually this was linked to Docker, or more specifically it was due to the Docker version of grep.

We use to run our scripts on a server having grep v2.20 and the VEP Docker image is using grep v3.1. Apparently these 2 versions handle differently the non UTF-8 characters (which are present in some phenotype names) and it impacts the sorting of the file and the indexing.

I submitted a pull request on the Phenotypes.pm plugins with a fix.

Best regards, Laurent

at7 commented 5 years ago

Dear @thedam, The PR has been merged into the release/97 branch. Could you please try running VEP with the Phenotypes plugin again? Thank you, Anja

thedam commented 5 years ago

seems like it's ok now. I approve :)