Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
446 stars 151 forks source link

Problem when installing VEP 109 via EasyBuild #1364

Closed verdurin closed 1 year ago

verdurin commented 1 year ago

Describe the issue

I'm working on updating the definition for VEP in EasyBuild, which is a tool for building applications on HPC clusters (https://easybuild.io). The previous definition (107) works fine. I've updated it to 109, changing nothing other than the version of the source file and the associated dependencies, and there is an error that appears to relate to API and/or cache files.

Additional information

Please fill in the following sections to help us find the source of your issue as quickly as possible.

The dependencies are supplied as modules.

System

Full VEP command line

perl INSTALL.pl --NO_BIOPERL --NO_HTSLIB --AUTO af --SPECIES all --NO_UPDATE --DESTDIR /eb/maint/software/VEP/109-GCC-11.3.0/modules/api

Full error message

Using non-default API installation directory /eb/maint/software/VEP/109-GCC-11.3.0/modules/api.
Please note this just specifies the location for downloaded API files. The vep script will remain in its current location where ensembl-vep was unzipped.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

PLEASE REMEMBER TO
1. add /eb/maint/software/VEP/109-GCC-11.3.0/modules/api to your PERL5LIB environment variable
2. add /eb/maint/software/VEP/109-GCC-11.3.0/modules/api/htslib to your PATH environment variable

Setting up directories

Downloading required Ensembl API files
 - fetching ensembl
 - unpacking /eb/maint/software/VEP/109-GCC-11.3.0/modules/api/Bio/tmp/ensembl.zip
 - moving files
 - getting version information
 - fetching ensembl-variation
 - unpacking /eb/maint/software/VEP/109-GCC-11.3.0/modules/api/Bio/tmp/ensembl-variation.zip
 - moving files
 - getting version information
 - fetching ensembl-funcgen
 - unpacking /eb/maint/software/VEP/109-GCC-11.3.0/modules/api/Bio/tmp/ensembl-funcgen.zip
 - moving files
 - getting version information
 - fetching ensembl-io
 - unpacking /eb/maint/software/VEP/109-GCC-11.3.0/modules/api/Bio/tmp/ensembl-io.zip
 - moving files
 - getting version information

Testing VEP installation
./t/AnnotationSource_Database_StructuralVariation.t .. ok
./t/OutputFactory_Tab.t .............................. ok
WARNING: The feature_type cDNA_match is being skipped
./t/bam_edit.t ....................................... ok
./t/Haplo_Runner.t ................................... ok
./t/version.t ........................................ ok
./t/Haplo_AnnotationSource_Database_Transcript.t ..... ok
./t/CacheDir.t ....................................... ok
./t/Parser_ID.t ...................................... ok
WARNING: The feature_type three_prime_utr is being skipped
WARNING: The feature_type start_codon is being skipped
WARNING: The feature_type five_prime_utr is being skipped
WARNING: The feature_type start_codon is being skipped
./t/AnnotationSource_File_GTF.t ...................... ok
./t/Parser_HGVS.t .................................... ok
"my" variable $sth masks earlier declaration in same scope at /eb/maint/software/VEP/109-GCC-11.3.0/modules/api/Bio/EnsEMBL/DBSQL/TranslationAdaptor.pm line 607.
./t/VariantRecoder.t ................................. ok
./t/InputBuffer.t .................................... ok
./t/AnnotationSource_File_BED.t ...................... ok
./t/Haplo_AnnotationSource_Cache_Transcript.t ........ ok
./t/FilterSet.t ...................................... ok
./t/BaseVEP.t ........................................ ok
./t/Parser_VEP_input.t ............................... ok
./t/Stats.t .......................................... ok
WARNING: The feature_type three_prime_utr is being skipped
WARNING: The feature_type start_codon is being skipped
WARNING: The feature_type five_prime_utr is being skipped
./t/Haplo_AnnotationSource_File_GTF.t ................ ok
./t/OutputFactory_JSON.t ............................. ok
./t/Parser.t ......................................... ok
./t/AnnotationSource_File_VCF.t ...................... ok
Use of uninitialized value within %pos in division (/) at /eb/maint/software/VEP/109-GCC-11.3.0/modules/Bio/EnsEMBL/VEP/OutputFactory.pm line 2298, <__ANONIO__> line 1.
./t/OutputFactory.t .................................. ok
./t/AnnotationSource_Cache_RegFeat.t ................. ok
./t/AnnotationSource_Database_Variation.t ............ ok
./t/AnnotationSource_File.t .......................... ok
./t/Haplo_Parser_VCF.t ............................... ok
./t/Haplo_InputBuffer.t .............................. ok
./t/AnnotationSource_Cache_Transcript.t .............. ok
./t/AnnotationSource_Database_Transcript.t ........... ok
./t/Parser_VCF.t ..................................... ok
./t/TranscriptTree.t ................................. ok
./t/AnnotationSource_Cache_Variation.t ............... ok
./t/AnnotationSourceAdaptor.t ........................ ok
./t/AnnotationSource_Cache.t ......................... ok
./t/Config.t ......................................... ok
./t/Parser_CAID.t .................................... ok
./t/AnnotationSource_Cache_VariationTabix.t .......... ok
./t/AnnotationSource.t ............................... ok
./t/Utils.t .......................................... ok
./t/Parser_SPDI.t .................................... ok
./t/AnnotationSource_File_BigWig.t ................... ok
./t/Parser_Region.t .................................. ok
WARNING: The feature_type chromosome is being skipped
WARNING: The feature_type biological_region is being skipped
WARNING: The feature_type three_prime_UTR is being skipped
WARNING: The feature_type five_prime_UTR is being skipped
./t/Haplo_AnnotationSource_File_GFF.t ................ ok
Possible attempt to separate words with commas at ./t/OutputFactory_VCF.t line 582.
./t/OutputFactory_VCF.t .............................. ok
./t/OutputFactory_VEP_output.t ....................... ok
./t/AnnotationSource_Database_RegFeat.t .............. ok
./t/Runner.t ......................................... ok
WARNING: The feature_type chromosome is being skipped
WARNING: The feature_type three_prime_UTR is being skipped
WARNING: The feature_type biological_region is being skipped
WARNING: The feature_type five_prime_UTR is being skipped
WARNING: The feature_type chromosome is being skipped
WARNING: The feature_type biological_region is being skipped
WARNING: The feature_type three_prime_UTR is being skipped
WARNING: The feature_type five_prime_UTR is being skipped
WARNING: The feature_type region is being skipped
WARNING: The feature_type match is being skipped
./t/AnnotationSource_File_GFF.t ...................... ok
All tests successful.
Files=49, Tests=1888, 105 wallclock secs ( 0.22 usr  0.12 sys + 96.69 cusr  9.24 csys = 106.27 CPU)
Result: PASS
 - OK!
 - downloading Acanthochromis_polyacanthus.ASM210954v1.dna.toplevel.fa.gz
 - downloading Acanthochromis_polyacanthus.ASM210954v1.dna.toplevel.fa.gz.fai
 - downloading Acanthochromis_polyacanthus.ASM210954v1.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/acanthochromis_polyacanthus/109_ASM210954v1/Acanthochromis_polyacanthus.ASM210954v1.dna.toplevel.fa.gz"

 - downloading Accipiter_nisus.Accipiter_nisus_ver1.0.dna.toplevel.fa.gz
 - downloading Accipiter_nisus.Accipiter_nisus_ver1.0.dna.toplevel.fa.gz.fai
 - downloading Accipiter_nisus.Accipiter_nisus_ver1.0.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/accipiter_nisus/109_Accipiter_nisus_ver1.0/Accipiter_nisus.Accipiter_nisus_ver1.0.dna.toplevel.fa.gz"

 - downloading Ailuropoda_melanoleuca.ASM200744v2.dna.toplevel.fa.gz
 - downloading Ailuropoda_melanoleuca.ASM200744v2.dna.toplevel.fa.gz.fai
 - downloading Ailuropoda_melanoleuca.ASM200744v2.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/ailuropoda_melanoleuca/109_ASM200744v2/Ailuropoda_melanoleuca.ASM200744v2.dna.toplevel.fa.gz"

 - downloading Amazona_collaria.ASM394721v1.dna.toplevel.fa.gz
 - downloading Amazona_collaria.ASM394721v1.dna.toplevel.fa.gz.fai
 - downloading Amazona_collaria.ASM394721v1.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/amazona_collaria/109_ASM394721v1/Amazona_collaria.ASM394721v1.dna.toplevel.fa.gz"

 - downloading Amphilophus_citrinellus.Midas_v5.dna.toplevel.fa.gz
 - downloading Amphilophus_citrinellus.Midas_v5.dna.toplevel.fa.gz.fai
 - downloading Amphilophus_citrinellus.Midas_v5.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/amphilophus_citrinellus/109_Midas_v5/Amphilophus_citrinellus.Midas_v5.dna.toplevel.fa.gz"

 - downloading Amphiprion_ocellaris.AmpOce1.0.dna.toplevel.fa.gz
 - downloading Amphiprion_ocellaris.AmpOce1.0.dna.toplevel.fa.gz.fai
 - downloading Amphiprion_ocellaris.AmpOce1.0.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/amphiprion_ocellaris/109_AmpOce1.0/Amphiprion_ocellaris.AmpOce1.0.dna.toplevel.fa.gz"

 - downloading Amphiprion_percula.Nemo_v1.dna.toplevel.fa.gz
 - downloading Amphiprion_percula.Nemo_v1.dna.toplevel.fa.gz.fai
 - downloading Amphiprion_percula.Nemo_v1.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/amphiprion_percula/109_Nemo_v1/Amphiprion_percula.Nemo_v1.dna.toplevel.fa.gz"

 - downloading Anabas_testudineus.fAnaTes1.2.dna.toplevel.fa.gz
 - downloading Anabas_testudineus.fAnaTes1.2.dna.toplevel.fa.gz.fai
 - downloading Anabas_testudineus.fAnaTes1.2.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/anabas_testudineus/109_fAnaTes1.2/Anabas_testudineus.fAnaTes1.2.dna.toplevel.fa.gz"

 - downloading Anas_platyrhynchos.ASM874695v1.dna.toplevel.fa.gz
 - downloading Anas_platyrhynchos.ASM874695v1.dna.toplevel.fa.gz.fai
 - downloading Anas_platyrhynchos.ASM874695v1.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/anas_platyrhynchos_platyrhynchos/109_CAU_duck1.0/Anas_platyrhynchos_platyrhynchos.CAU_duck1.0.dna.toplevel.fa.gz"

 - downloading Anas_zonorhyncha.ASM222487v1.dna.toplevel.fa.gz
 - downloading Anas_zonorhyncha.ASM222487v1.dna.toplevel.fa.gz.fai
 - downloading Anas_zonorhyncha.ASM222487v1.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /home/software/.vep/anas_zonorhyncha/109_ASM222487v1/Anas_zonorhyncha.ASM222487v1.dna.toplevel.fa.gz"

ERROR: Could not change directory to dna

It's a bit surprising that it's trying to create cache files, given that the invocation of the installer does not request them. Has something changed here?

I have tried specifying the cachedir and the same error appears.

nuno-agostinho commented 1 year ago

Hi @verdurin,

Hope you are having a nice day.

It's a bit surprising that it's trying to create cache files, given that the invocation of the installer does not request them.

Your command is not installing VEP cache data, but rather FASTA files: --AUTO af indicates to install updates to the API (a) and to download FASTA files (f) for all species (--SPECIES all).

It seems there is a bug when downloading FASTA files for all species: there is a folder named ancestral_alleles in our FTP server where the FASTA files are located that will raise the error shown because it is not named after any species. For now, if you want to download FASTA files for all species, you will need to indicate the species individually, like so:

perl INSTALL.pl --NO_BIOPERL --NO_HTSLIB --AUTO af --SPECIES acanthochromis_polyacanthus,accipiter_nisus,ailuropoda_melanoleuca,amazona_collaria,amphilophus_citrinellus,amphiprion_ocellaris,amphiprion_percula,anabas_testudineus,anas_platyrhynchos,anas_platyrhynchos_platyrhynchos,anas_zonorhyncha,anolis_carolinensis,anser_brachyrhynchus,anser_cygnoides,aotus_nancymaae,apteryx_haastii,apteryx_owenii,apteryx_rowi,aquila_chrysaetos_chrysaetos,astatotilapia_calliptera,astyanax_mexicanus,astyanax_mexicanus_pachon,athene_cunicularia,balaenoptera_musculus,betta_splendens,bison_bison_bison,bos_grunniens,bos_indicus_hybrid,bos_mutus,bos_taurus,bos_taurus_hybrid,bubo_bubo,buteo_japonicus,caenorhabditis_elegans,cairina_moschata_domestica,calidris_pugnax,calidris_pygmaea,callithrix_jacchus,callorhinchus_milii,camarhynchus_parvulus,camelus_dromedarius,canis_lupus_dingo,canis_lupus_familiaris,canis_lupus_familiarisbasenji,canis_lupus_familiarisboxer,canis_lupus_familiarisgreatdane,canis_lupus_familiarisgsd,capra_hircus,capra_hircus_blackbengal,carassius_auratus,carlito_syrichta,castor_canadensis,catagonus_wagneri,catharus_ustulatus,cavia_aperea,cavia_porcellus,cebus_imitator,cercocebus_atys,cervus_hanglu_yarkandensis,chelonoidis_abingdonii,chelydra_serpentina,chinchilla_lanigera,chlorocebus_sabaeus,choloepus_hoffmanni,chrysemys_picta_bellii,chrysolophus_pictus,ciona_intestinalis,ciona_savignyi,clupea_harengus,colobus_angolensis_palliatus,corvus_moneduloides,cottoperca_gobio,coturnix_japonica,cricetulus_griseus_chok1gshd,cricetulus_griseus_crigri,cricetulus_griseus_picr,crocodylus_porosus,cyanistes_caeruleus,cyclopterus_lumpus,cynoglossus_semilaevis,cyprinodon_variegatus,cyprinus_carpio,cyprinus_carpio_carpio,cyprinus_carpio_germanmirror,cyprinus_carpio_hebaored,cyprinus_carpio_huanghe,danio_rerio,dasypus_novemcinctus,delphinapterus_leucas,denticeps_clupeoides,dicentrarchus_labrax,dipodomys_ordii,dromaius_novaehollandiae,drosophila_melanogaster,echeneis_naucrates,echinops_telfairi,electrophorus_electricus,eptatretus_burgeri,equus_asinus,equus_asinus_asinus,equus_caballus,erinaceus_europaeus,erpetoichthys_calabaricus,erythrura_gouldiae,esox_lucius,falco_tinnunculus,felis_catus,ficedula_albicollis,fukomys_damarensis,fundulus_heteroclitus,gadus_morhua,gallus_gallus,gallus_gallus_gca000002315v5,gallus_gallus_gca016700215v2,gambusia_affinis,gasterosteus_aculeatus,geospiza_fortis,gopherus_agassizii,gopherus_evgoodei,gorilla_gorilla,gouania_willdenowi,haplochromis_burtoni,heterocephalus_glaber_female,heterocephalus_glaber_male,hippocampus_comes,homo_sapiens,hucho_hucho,ictalurus_punctatus,ictidomys_tridecemlineatus,jaculus_jaculus,junco_hyemalis,kryptolebias_marmoratus,labrus_bergylta,larimichthys_crocea,lates_calcarifer,laticauda_laticaudata,latimeria_chalumnae,lepidothrix_coronata,lepisosteus_oculatus,leptobrachium_leishanense,lonchura_striata_domestica,loxodonta_africana,lynx_canadensis,macaca_fascicularis,macaca_mulatta,macaca_nemestrina,malurus_cyaneus_samueli,manacus_vitellinus,mandrillus_leucophaeus,marmota_marmota_marmota,mastacembelus_armatus,maylandia_zebra,meleagris_gallopavo,melopsittacus_undulatus,meriones_unguiculatus,mesocricetus_auratus,microcebus_murinus,microtus_ochrogaster,mola_mola,monodelphis_domestica,monodon_monoceros,monopterus_albus,moschus_moschiferus,mus_caroli,mus_musculus,mus_musculus_129s1svimj,mus_musculus_aj,mus_musculus_akrj,mus_musculus_balbcj,mus_musculus_c3hhej,mus_musculus_c57bl6nj,mus_musculus_casteij,mus_musculus_cbaj,mus_musculus_dba2j,mus_musculus_fvbnj,mus_musculus_lpj,mus_musculus_nodshiltj,mus_musculus_nzohlltj,mus_musculus_pwkphj,mus_musculus_wsbeij,mus_pahari,mus_spicilegus,mus_spretus,mustela_putorius_furo,myotis_lucifugus,myripristis_murdjan,naja_naja,nannospalax_galili,neogobius_melanostomus,neolamprologus_brichardi,neovison_vison,nomascus_leucogenys,notamacropus_eugenii,notechis_scutatus,nothobranchius_furzeri,nothoprocta_perdicaria,numida_meleagris,ochotona_princeps,octodon_degus,oncorhynchus_kisutch,oncorhynchus_mykiss,oncorhynchus_tshawytscha,oreochromis_aureus,oreochromis_niloticus,ornithorhynchus_anatinus,oryctolagus_cuniculus,oryzias_javanicus,oryzias_latipes,oryzias_latipes_hni,oryzias_latipes_hsok,oryzias_melastigma,oryzias_sinensis,otolemur_garnettii,otus_sunia,ovis_aries,ovis_aries_rambouillet,pan_paniscus,pan_troglodytes,panthera_leo,panthera_pardus,panthera_tigris_altaica,papio_anubis,parambassis_ranga,paramormyrops_kingsleyae,parus_major,pavo_cristatus,pelodiscus_sinensis,pelusios_castaneus,periophthalmus_magnuspinnatus,peromyscus_maniculatus_bairdii,petromyzon_marinus,phascolarctos_cinereus,phasianus_colchicus,phocoena_sinus,physeter_catodon,piliocolobus_tephrosceles,podarcis_muralis,poecilia_formosa,poecilia_latipinna,poecilia_mexicana,poecilia_reticulata,pogona_vitticeps,pongo_abelii,procavia_capensis,prolemur_simus,propithecus_coquereli,pseudonaja_textilis,pteropus_vampyrus,pundamilia_nyererei,pygocentrus_nattereri,rattus_norvegicus,rhinolophus_ferrumequinum,rhinopithecus_bieti,rhinopithecus_roxellana,saccharomyces_cerevisiae,saimiri_boliviensis_boliviensis,salarias_fasciatus,salmo_salar,salmo_trutta,salvator_merianae,sander_lucioperca,sarcophilus_harrisii,sciurus_vulgaris,scleropages_formosus,scophthalmus_maximus,serinus_canaria,seriola_dumerili,seriola_lalandi_dorsalis,sinocyclocheilus_anshuiensis,sinocyclocheilus_grahami,sinocyclocheilus_rhinocerous,sorex_araneus,sparus_aurata,spermophilus_dauricus,sphaeramia_orbicularis,sphenodon_punctatus,stachyris_ruficeps,stegastes_partitus,strigops_habroptila,strix_occidentalis_caurina,struthio_camelus_australis,suricata_suricatta,sus_scrofa,sus_scrofa_bamei,sus_scrofa_berkshire,sus_scrofa_hampshire,sus_scrofa_jinhua,sus_scrofa_landrace,sus_scrofa_largewhite,sus_scrofa_meishan,sus_scrofa_pietrain,sus_scrofa_rongchang,sus_scrofa_tibetan,sus_scrofa_usmarc,sus_scrofa_wuzhishan,taeniopygia_guttata,takifugu_rubripes,terrapene_carolina_triunguis,tetraodon_nigroviridis,theropithecus_gelada,tupaia_belangeri,tursiops_truncatus,urocitellus_parryii,ursus_americanus,ursus_maritimus,ursus_thibetanus_thibetanus,varanus_komodoensis,vicugna_pacos,vombatus_ursinus,vulpes_vulpes,xenopus_tropicalis,xiphophorus_couchianus,xiphophorus_maculatus,zalophus_californianus,zonotrichia_albicollis,zosterops_lateralis_melanops --NO_UPDATE --DESTDIR /eb/maint/software/VEP/109-GCC-11.3.0/modules/api

I am sorry for this workaround, but we will try to fix this soon.

Tell me if you still facing issues with this workaround.

Kind regards, Nuno

verdurin commented 1 year ago

Hi @nuno-agostinho

Thanks for the quick reply.

Will give this a try.

Best Wishes, Adam

nuno-agostinho commented 1 year ago

Hi @verdurin,

Just to mention that a fix for this issue will be available for VEP 110. Thanks for reporting this bug!

I'm now going to close this issue, but feel free to create a new one if you find any problems.

Have a great week!

Cheers, Nuno

garadar commented 11 months ago

Hi,

Trying to install VEP-110 though easybuild here the build cmd:

== 2023-10-09 12:14:17,953 easyblock.py:4277 WARNING build failed (first 300 chars): cmd " perl INSTALL.pl --NO_BIOPERL --NO_HTSLIB --AUTO af --SPECIES all --NO_UPDATE --DESTDIR /opt/ebsofts/VEP/110-GCC-11.3.0/modules/api " exited with exit code 255 and output:

And the error output: ERROR: Unable to parse assembly name from Cyprinus_carpio_german_mirror.German_Mirror_carp_1.0.dna.toplevel.fa.gz

So it's installing all the species but the Cyprinus_carpio_german_mirror.German_Mirror_carp_1.0.dna.toplevel.fa.gz seems to be not available.

By looking a little more further:

https://www.ensembl.org/Cyprinus_carpio_german_mirror/Info/Index

https://ftp.ensembl.org/pub/release-110/fasta/cyprinus_carpio_germanmirror/dna/

It seems there is a typo in the name file: https://ftp.ensembl.org/pub/release-110/fasta/cyprinus_carpio_germanmirror/dna/Cyprinus_carpio_germanmirror.German_Mirror_carp_1.0.dna.toplevel.fa.gz

It missing the '-' beetween german and mirror

nuno-agostinho commented 11 months ago

Hey @garadar,

Thanks for reporting this issue, but this seems unrelated to the topic mentioned herein and it is much harder for us to track a discussion on closed issues. Do you mind opening a new issue?

Thank you and I'm sorry for the inconvenience.

Best regards, Nuno