Closed thedam closed 5 years ago
same with VEP 96
[E::get_intv] failed to parse TBX_GENERIC, was wrong -p [type] used? The offending line was: "Binary file (standard input) matches" [E::hts_idx_push] unsorted positions tbx_index_build failed: /opt/vep/.vep/Plugins/Phenotypes.pm_homo_sapiens_96_GRCh37.gvf.gz WARNING: Failed to instantiate plugin Phenotypes: ERROR: tabix failed
2019-08-26 10:04:11 - INFO: BAM-edited cache detected, enabling --use_transcript_ref; use --use_given_ref to override this WARNING: No input file format specified - detected vcf format
Dear @thedam,
I am afraid that it's not linked to Docker.
About the first part, I suspect that there was some connection issues with the Ensembl databases server when you ran it.
About the indexing itself, it seems that the zgrep command used to sort the Phenotype GVF file found some binary characters (?), therefore tabix can't index the plugin's data file. I am investigating what are thoses characters and why they are in the GVF file.
Best regards, Laurent
Dear @thedam,
Well, actually this was linked to Docker, or more specifically it was due to the Docker version of grep
.
We use to run our scripts on a server having grep v2.20
and the VEP Docker image is using grep v3.1
.
Apparently these 2 versions handle differently the non UTF-8 characters (which are present in some phenotype names) and it impacts the sorting of the file and the indexing.
I submitted a pull request on the Phenotypes.pm
plugins with a fix.
Best regards, Laurent
Dear @thedam, The PR has been merged into the release/97 branch. Could you please try running VEP with the Phenotypes plugin again? Thank you, Anja
seems like it's ok now. I approve :)
The first run of new docker image (VEP97):