Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

VEP Installation Failed: Failed 2/42 test programs. 0/1752 subtests failed. #1671

Open Lukecassar21 opened 1 month ago

Lukecassar21 commented 1 month ago

I installed VEP using the github installation, prior to running perl INSTALL.pl I installed the necessary packages Archive::Zip, DBD::mysql and DBI using cpanm. Unfortunately, the installation ended with 2 tests failing, and vep not being able to run properly after the installation.

Installing VEP (initial error)

When running perl INSTALL.pl I get the following output:

Command

perl INSTALL.pl

Terminal Output

Hello! This installer will help you set up VEP v111, including:

  • Install v111 of the Ensembl API for use by the VEP. It will not affect any existing installations of the Ensembl API that you may have.
  • Download and install cache files from Ensembl's FTP server.
  • Download FASTA files from Ensembl's FTP server.
  • Download VEP plugins.

Checking for installed versions of the Ensembl API...done

Setting up directories Destination directory ./Bio already exists. Do you want to overwrite it (if updating VEP this is probably OK) (y/n)? y

  • fetching BioPerl
  • unpacking ./Bio/tmp/release-1-6-924.zip
  • moving files Attempting to install Bio::DB::HTS and htslib.

If this fails, try re-running with --NO_HTSLIB

  • checking out HTSLib fatal: destination path 'htslib' already exists and is not an empty directory.
  • building HTSLIB in ./htslib In /home/user/ensembl-vep/htslib make: Nothing to be done for 'all'.
  • unpacking ./Bio/tmp/biodbhts.zip to ./Bio/tmp/ ./Bio/tmp/Bio-DB-HTS-2.11 - moving files to ./biodbhts
  • making Bio::DB:HTS Created MYMETA.yml and MYMETA.json Creating new 'Build' script for 'Bio-DB-HTS' version '2.11' Building Bio-DB-HTS x86_64-linux-gnu-gcc -I/home/user/ensembl-vep/htslib -I/usr/lib/x86_64-linux-gnu/perl/5.34/CORE -DVERSION="2.11" -DXS_VERSION="2.11" -fPIC -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -Wno-error -Wno-unused-result -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o lib/Bio/DB/HTS.o lib/Bio/DB/HTS.c lib/Bio/DB/HTS.xs: In function ‘XS_BioDBHTSVCFHeader_fmt_text’: lib/Bio/DB/HTS.xs:1770:5: warning: ‘bcf_hdr_fmt_text’ is deprecated: use bcf_hdr_format() instead [-Wdeprecated-declarations] 1770 | RETVAL = newSVpv(bcf_hdr_fmt_text(header, is_bcf, &len), 0); | ^~ In file included from lib/Bio/DB/HTS.xs:59: /home/user/ensembl-vep/htslib/htslib/vcf.h:439:11: note: declared here 439 | char bcf_hdr_fmt_text(const bcf_hdr_t hdr, int is_bcf, int *len) | ^~~~ ExtUtils::Mkbootstrap::Mkbootstrap('blib/arch/auto/Bio/DB/HTS/HTS.bs') x86_64-linux-gnu-gcc -shared -L/usr/local/lib -fstack-protector-strong -o blib/arch/auto/Bio/DB/HTS/HTS.so lib/Bio/DB/HTS.o -L/home/user/ensembl-vep/htslib -Wl,-rpath,/home/user/ensembl-vep/htslib -lhts -lpthread -lz x86_64-linux-gnu-gcc -I/home/user/ensembl-vep/htslib -I/usr/lib/x86_64-linux-gnu/perl/5.34/CORE -DVERSION="2.11" -DXS_VERSION="2.11" -fPIC -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -Wno-error -Wno-unused-result -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o lib/Bio/DB/HTS/Faidx.o lib/Bio/DB/HTS/Faidx.c ExtUtils::Mkbootstrap::Mkbootstrap('blib/arch/auto/Bio/DB/HTS/Faidx/Faidx.bs') x86_64-linux-gnu-gcc -shared -L/usr/local/lib -fstack-protector-strong -o blib/arch/auto/Bio/DB/HTS/Faidx/Faidx.so lib/Bio/DB/HTS/Faidx.o -L/home/user/ensembl-vep/htslib -Wl,-rpath,/home/user/ensembl-vep/htslib -lhts -lpthread -lz

Downloading required Ensembl API files

  • fetching ensembl
  • unpacking ./Bio/tmp/ensembl.zip
  • moving files
  • getting version information
  • fetching ensembl-variation
  • unpacking ./Bio/tmp/ensembl-variation.zip
  • moving files
  • getting version information
  • fetching ensembl-funcgen
  • unpacking ./Bio/tmp/ensembl-funcgen.zip
  • moving files
  • getting version information
  • fetching ensembl-io
  • unpacking ./Bio/tmp/ensembl-io.zip
  • moving files
  • getting version information

Testing VEP installation ./t/AnnotationSource_Database_RegFeat.t .............. ok
./t/FilterSet.t ...................................... ok
./t/Parser_HGVS.t .................................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Parser_HGVS.t .................................... ok
./t/AnnotationSource_File_GTF.t ...................... Can't locate Test/Warnings.pm in @INC (you may need to install the Test::Warnings module) (@INC contains: /home/user/ensembl-vep/modules ./Bio /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.34.0 /usr/local/share/perl/5.34.0 /usr/lib/x86_64-linux-gnu/perl5/5.34 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.34 /usr/share/perl/5.34 /usr/local/lib/site_perl .) at ./t/AnnotationSource_File_GTF.t line 20. BEGIN failed--compilation aborted at ./t/AnnotationSource_File_GTF.t line 20. ./t/AnnotationSource_File_GTF.t ...................... Dubious, test returned 2 (wstat 512, 0x200) No subtests run ./t/AnnotationSource_Cache_RegFeat.t ................. 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSource_Cache_RegFeat.t ................. ok
./t/AnnotationSource_File_VCF.t ...................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSource_File_VCF.t ...................... ok
./t/AnnotationSource_Cache_Variation.t ............... 26/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSource_Cache_Variation.t ............... ok
./t/Parser_SPDI.t .................................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Parser_SPDI.t .................................... ok
./t/Config.t ......................................... ok
./t/Parser_VCF.t ..................................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Parser_VCF.t ..................................... ok
./t/AnnotationSource_Database_Transcript.t ........... ok
./t/AnnotationSource_File.t .......................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSource_File.t .......................... ok
./t/OutputFactory_Tab.t .............................. 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/OutputFactory_Tab.t .............................. ok
./t/AnnotationSource_Cache.t ......................... ok
./t/Parser_Region.t .................................. "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Parser_Region.t .................................. 1/? WARNING: line 1 skipped (21:25587759-25587759:1/A): A type is not supported WARNING: line 1 skipped (21:25587759-25587759/A): A type is not supported WARNING: line 1 skipped (21:25587759-25587759:-1/G): G type is not supported WARNING: line 1 skipped (21:25587759-25587758:1/A): A type is not supported WARNING: line 1 skipped (21:25587759-25587759:1/-): - type is not supported ./t/Parser_Region.t .................................. ok
./t/TranscriptTree.t ................................. ok
./t/version.t ........................................ ok
./t/bam_edit.t ....................................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/bam_edit.t ....................................... 1/? WARNING: Ignoring non-supported 'cDNA_match' feature_type from /home/user/ensembl-vep/t/testdata/custom/bam_edit.gff.gz ./t/bam_edit.t ....................................... ok
./t/AnnotationSource_File_BigWig.t ................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSource_File_BigWig.t ................... ok
./t/Stats.t .......................................... 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Stats.t .......................................... ok
./t/AnnotationSource.t ............................... 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSource.t ............................... ok
./t/AnnotationSource_Cache_Transcript.t .............. 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSource_Cache_Transcript.t .............. ok
./t/OutputFactory_VCF.t .............................. 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/OutputFactory_VCF.t .............................. ok
./t/AnnotationSourceAdaptor.t ........................ "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSourceAdaptor.t ........................ ok
./t/OutputFactory_VEP_output.t ....................... 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/OutputFactory_VEP_output.t ....................... ok
./t/AnnotationSource_Cache_VariationTabix.t .......... 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSource_Cache_VariationTabix.t .......... ok
./t/AnnotationSource_File_GFF.t ...................... Can't locate Test/Warnings.pm in @INC (you may need to install the Test::Warnings module) (@INC contains: /home/user/ensembl-vep/modules ./Bio /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.34.0 /usr/local/share/perl/5.34.0 /usr/lib/x86_64-linux-gnu/perl5/5.34 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.34 /usr/share/perl/5.34 /usr/local/lib/site_perl .) at ./t/AnnotationSource_File_GFF.t line 20. BEGIN failed--compilation aborted at ./t/AnnotationSource_File_GFF.t line 20. ./t/AnnotationSource_File_GFF.t ...................... Dubious, test returned 2 (wstat 512, 0x200) No subtests run ./t/OutputFactory.t .................................. "my" variable $result masks earlier declaration in same scope at ./t/OutputFactory.t line 2038. "my" variable $genotype masks earlier declaration in same scope at ./t/OutputFactory.t line 2039. ./t/OutputFactory.t .................................. 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/OutputFactory.t .................................. ok
./t/Parser_VEP_input.t ............................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Parser_VEP_input.t ............................... ok
./t/CacheDir.t ....................................... ok
./t/Parser.t ......................................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Parser.t ......................................... ok
./t/AnnotationSource_File_BED.t ...................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/AnnotationSource_File_BED.t ...................... ok
./t/Utils.t .......................................... ok
./t/BaseVEP.t ........................................ 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/BaseVEP.t ........................................ ok
./t/AnnotationSource_Database_Variation.t ............ ok
./t/VariantRecoder.t ................................. "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/VariantRecoder.t ................................. ok
./t/Parser_ID.t ...................................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Parser_ID.t ...................................... ok
./t/InputBuffer.t .................................... 1/? "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/InputBuffer.t .................................... ok
./t/Parser_CAID.t .................................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Parser_CAID.t .................................... ok
./t/OutputFactory_JSON.t ............................. ok
./t/AnnotationSource_Database_StructuralVariation.t .. ok
./t/Runner.t ......................................... "my" variable $sth masks earlier declaration in same scope at Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607. ./t/Runner.t ......................................... ok

Test Summary Report

./t/AnnotationSource_File_GTF.t (Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: No plan found in TAP output ./t/AnnotationSource_File_GFF.t (Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: No plan found in TAP output Files=42, Tests=1752, 48 wallclock secs ( 0.13 usr 0.06 sys + 44.63 cusr 2.87 csys = 47.69 CPU) Result: FAIL Failed 2/42 test programs. 0/1752 subtests failed. Test Summary Report.txt

Running VEP after Failed 2/42 Test Programs.

Following this, trying to run the vep binary on a small 2mb SV VCF (less than 7000 structural variants) file results in the following output :

Command

./vep -i /media/user/Maxtor/SV_Analysis/outputs/manta_output/NA24385_manta_did.vcf -o /media/user/Maxtor/SV_Analysis/outputs/manta_output/NA24385_manta_did_vep.vcf --database

Terminal Output

WARNING: line 1 skipped (chr1 724844 MantaDUP:TANDEM:1:24558:24560:4:0:...): variant size (223477389) is bigger than --max_sv_size (10000000)

WARNING: line 42 skipped (chr1 17007336 MantaDUP:TANDEM:1:2171:2507:0:1:...): variant size (128375323) is bigger than --max_sv_size (10000000)

WARNING: line 77 skipped (chr1 33476476 MantaDEL:20511:0:2:3:0:0 A (DEL)...): variant size (25619971) is bigger than --max_sv_size (10000000)

WARNING: line 208 skipped (chr1 106505910 MantaDUP:TANDEM:1:27147:27150:0...): variant size (76184438) is bigger than --max_sv_size (10000000)

WARNING: line 284 skipped (chr1 159738186 MantaDUP:TANDEM:81323:1:3:0:0:0...): variant size (64790026) is bigger than --max_sv_size (10000000)

Result The output is a 0 byte VCF file and the process hangs indefinitely. Any help on this issue would be greatly appreciated.

jamie-m-a commented 1 month ago

Hi @Lukecassar21

Sorry for the delay in responding. I suspect that problems you are having are due to trying run from the database while using a fairly large number of SVs. Have you tried running locally (i.e. using --offline and --cache)?

If you are able to share some of your input VCF we can test this out further here to see if there are any issues related to your install, but the warnings you got shouldn't be showstoppers for running VEP.

Also I'd point out that if you are running into issues with installing, we do provide a containerised version of Ensembl VEP, that comes with all dependencies and plugins. Instructions for setting it up are here: Ensembl VEP docker install

Lukecassar21 commented 1 month ago

No worries @jamie-m-a , thank you for the response. I still have yet to test it using a GRCh37 cache, I'll get back to you as soon as I get that done.

This is the VCF file I used to try a trial run on vep:

NA24385_manta_did.vcf.gz

As for the installation, I unfortunately cannot use the VEP docker due to difficulties on our server which prevents docker from being used. For the time being I resorted to using the conda installation of VEP. Using conda I didn't run into any of the previously mentioned issues, using a similar VCF file made from the GRCh38 genome, an output was created when using the phenotype and CADD plugins. The only problem is that when I use the StructuralVariantOverlap Plugin with the gnomad 4.1 vcf file required for it the process also hangs indefinitely and produces a 0 byte VCF file.

All in all I have 3 questions:

  1. Aside from switching to docker or conda, could there be anything to do regarding the standard installation?
  2. Could anything be done for the StructuralVariantOverlap plugin to make it run properly?
  3. Aside from the phenotype, CADD and StructuralVariantOverlap plugins, are there any other plugins which can work with Structural Variants in specific?
jamie-m-a commented 1 month ago

Hi @Lukecassar21

Great - let us know how it goes when you try with the cache.

In response to your questions:

  1. I think your install is probably fine, to get around the failure you see you could try to locally install Test::Warnings with cpanm and add it to your path. It's worth saying that the VEP in docker hub actually uses singularity - which doesn't require root access (in case that was your concern about using docker on your cluster).
  2. I believe much of the functionality of the structural variant overlap plugin is now incorporated in to the core VEP code - I'm going to tag @nuno-agostinho for comment on this as he worked on it and will have more insight.
  3. Off the top of my head, GO might be useful (depending on the size of the SV), probably DosageSensitivity, possibly G2P also.
Lukecassar21 commented 2 weeks ago

Hi, sorry for the late response, I managed to figure out that the main reason for my vep installation not working was because of my local perl installation, gcc and g++ libraries conflicting with my conda installations. Even if I installed Test::Warnings or any other package, it would state that the package was installed but then vep would not be able to detect it.

I made it so that if I were to use my conda versions of perl, gcc and g++, the environmental variables in .bashrc would specifically reference the flags present in their directories instead of mistakenly referencing the directories of my system perl, gcc, and g++.

Thank you for the tip of using singularity, we'le see if it's a possible implementation for us.

As for vep itself: When running with the GRCh37 merged cache as well as --port 3337, I managed to get a working result from vep after the fix I stated above. I noticed on the annotation sources page that structural variant overlap can only be checked if you are comparing to a database, and not if you are only using a cache. Is this the case if you use both the cache and online or just if you use cache? If so, how do you use the online database for GRCh38 while referencing a specific cache version for GRCh38 as well? The current cache version I am using is 105 (not for the example with GRCh37).

And for the plugins, would you recommend using all of the plugins I listed in addition to the ones you recommended in one command? I am referring to the plugins: Phenotype, CADD-SV, GO, G2P and DosageSensitivity.

Leaving StructuralVariantOverlap out for the time being since it is still hanging the process with no output whenever I enable the plugin and supply it with the gnomad4.1 sv sites file.