Open Lukecassar21 opened 1 month ago
Hi @Lukecassar21
Sorry for the delay in responding. I suspect that problems you are having are due to trying run from the database while using a fairly large number of SVs. Have you tried running locally (i.e. using --offline and --cache)?
If you are able to share some of your input VCF we can test this out further here to see if there are any issues related to your install, but the warnings you got shouldn't be showstoppers for running VEP.
Also I'd point out that if you are running into issues with installing, we do provide a containerised version of Ensembl VEP, that comes with all dependencies and plugins. Instructions for setting it up are here: Ensembl VEP docker install
No worries @jamie-m-a , thank you for the response. I still have yet to test it using a GRCh37 cache, I'll get back to you as soon as I get that done.
This is the VCF file I used to try a trial run on vep:
As for the installation, I unfortunately cannot use the VEP docker due to difficulties on our server which prevents docker from being used. For the time being I resorted to using the conda installation of VEP. Using conda I didn't run into any of the previously mentioned issues, using a similar VCF file made from the GRCh38 genome, an output was created when using the phenotype and CADD plugins. The only problem is that when I use the StructuralVariantOverlap Plugin with the gnomad 4.1 vcf file required for it the process also hangs indefinitely and produces a 0 byte VCF file.
All in all I have 3 questions:
Hi @Lukecassar21
Great - let us know how it goes when you try with the cache.
In response to your questions:
Hi, sorry for the late response, I managed to figure out that the main reason for my vep installation not working was because of my local perl installation, gcc and g++ libraries conflicting with my conda installations. Even if I installed Test::Warnings or any other package, it would state that the package was installed but then vep would not be able to detect it.
I made it so that if I were to use my conda versions of perl, gcc and g++, the environmental variables in .bashrc would specifically reference the flags present in their directories instead of mistakenly referencing the directories of my system perl, gcc, and g++.
Thank you for the tip of using singularity, we'le see if it's a possible implementation for us.
As for vep itself: When running with the GRCh37 merged cache as well as --port 3337, I managed to get a working result from vep after the fix I stated above. I noticed on the annotation sources page that structural variant overlap can only be checked if you are comparing to a database, and not if you are only using a cache. Is this the case if you use both the cache and online or just if you use cache? If so, how do you use the online database for GRCh38 while referencing a specific cache version for GRCh38 as well? The current cache version I am using is 105 (not for the example with GRCh37).
And for the plugins, would you recommend using all of the plugins I listed in addition to the ones you recommended in one command? I am referring to the plugins: Phenotype, CADD-SV, GO, G2P and DosageSensitivity.
Leaving StructuralVariantOverlap out for the time being since it is still hanging the process with no output whenever I enable the plugin and supply it with the gnomad4.1 sv sites file.
I installed VEP using the github installation, prior to running perl INSTALL.pl I installed the necessary packages Archive::Zip, DBD::mysql and DBI using cpanm. Unfortunately, the installation ended with 2 tests failing, and vep not being able to run properly after the installation.
Installing VEP (initial error)
When running perl INSTALL.pl I get the following output:
Command
perl INSTALL.pl
Terminal Output
Test Summary Report
./t/AnnotationSource_File_GTF.t (Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: No plan found in TAP output ./t/AnnotationSource_File_GFF.t (Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: No plan found in TAP output Files=42, Tests=1752, 48 wallclock secs ( 0.13 usr 0.06 sys + 44.63 cusr 2.87 csys = 47.69 CPU) Result: FAIL Failed 2/42 test programs. 0/1752 subtests failed. Test Summary Report.txt
Running VEP after Failed 2/42 Test Programs.
Following this, trying to run the vep binary on a small 2mb SV VCF (less than 7000 structural variants) file results in the following output :
Command
./vep -i /media/user/Maxtor/SV_Analysis/outputs/manta_output/NA24385_manta_did.vcf -o /media/user/Maxtor/SV_Analysis/outputs/manta_output/NA24385_manta_did_vep.vcf --database
Terminal Output
Result The output is a 0 byte VCF file and the process hangs indefinitely. Any help on this issue would be greatly appreciated.