bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Version 90_GRCh37 not found #2062

Closed phu5ion closed 7 years ago

phu5ion commented 7 years ago

Hi Brad,

Could you take a look at this issue pls? I was trying to download mm10 genome - the download contains a file called homo_sapiens_merged_vep_90_GRCh37.tar.gz so I don't understand why the error occurred. --2017-09-06 12:13:03-- ftp://ftp.ensembl.org/pub/release-90/variation/VEP/homo_sapiens_merged_vep_90_GRCh37.tar.gz => “homo_sapiens_merged_vep_90_GRCh37.tar.gz” Resolving ftp.ensembl.org... 193.62.193.8 Connecting to ftp.ensembl.org|193.62.193.8|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-90/variation/VEP ... done. ==> SIZE homo_sapiens_merged_vep_90_GRCh37.tar.gz ... 8595689246 ==> PASV ... done. ==> REST 8595689246 ... done. ==> RETR homo_sapiens_merged_vep_90_GRCh37.tar.gz ... done. Length: 8595689246 (8.0G), 0 remaining (unauthoritative)

100%[+++++++++++++++++++++++++++++++++++++] 8,595,689,246 --.-K/s in 0s

2017-09-06 12:13:06 (0.00 B/s) - “homo_sapiens_merged_vep_90_GRCh37.tar.gz” saved [8595689246]

Traceback (most recent call last): File "/mnt/projects/dlho/tancrc/bcbio_pipeline/bin/bcbio_nextgen.py", line 215, in install.upgrade_bcbio(kwargs["args"]) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 96, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 298, in upgrade_bcbio_data _upgrade_vep_data(s["fabricrc_overrides"]["galaxy_home"], tooldir) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 335, in _upgrade_vep_data effects.prep_vep_cache(dbkey, ref_file, tooldir) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/variation/effects.py", line 108, in prep_vep_cache do.run("%s && %s" % (perl_exports, " ".join(cmd)), "Convert VEP cache to tabix %s" % ensembl_name) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run _do_run(cmd, checks, log_stdout, env=env) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 102, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; unset PERL5LIB && export PATH=/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/bin:$PATH && /mnt/projects/dlho/tancrc/bcbio_pipeline/bin/vep_convert_cache --species homo_sapiens_merged --version 90_GRCh37 --dir /mnt/projects/dlho/tancrc/bcbio_pipeline/genomes/Hsapiens/GRCh37/vep --force_overwrite --remove ERROR: Version 90_GRCh37 not found ' returned non-zero exit status 255

chapmanb commented 7 years ago

Sorry about the issue. Did you also update your tools (with --tools) prior to running the data upgrade? The error looks like you don't have the latest ensembl-vep, you should have 90.1:

$ bcbio_conda list | grep vep
ensembl-vep               90.1                htslib1.5_1    bioconda

Hopefully that fixes the problem for you.

phu5ion commented 7 years ago

Hi Brad,

Thanks for your reply. From where do I update the corresponding data files (ExAC etc) then? I got the following error: -------------------- EXCEPTION -------------------- MSG: ERROR: ExAC data is not available in this cache; gnomAD exome data is available with --af_gnomad STACK Bio::EnsEMBL::VEP::CacheDir::get_all_AnnotationSources /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/ensembl-vep-90.3-0/modules/Bio/EnsEMBL/VEP/CacheDir.pm:181 STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/ensembl-vep-90.3-0/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:121 STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/ensembl-vep-90.3-0/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:91 STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/ensembl-vep-90.3-0/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175 STACK Bio::EnsEMBL::VEP::Runner::init /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/ensembl-vep-90.3-0/modules/Bio/EnsEMBL/VEP/Runner.pm:123 STACK Bio::EnsEMBL::VEP::Runner::run /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/ensembl-vep-90.3-0/modules/Bio/EnsEMBL/VEP/Runner.pm:194 STACK toplevel /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/bin/vep:220 Date (localtime) = Mon Sep 11 08:59:38 2017 Ensembl API version = 90 returned non-zero exit status 255 Traceback (most recent call last): File "/mnt/projects/dlho/tancrc/bcbio_pipeline/bin/bcbio_nextgen.py", line 234, in main(kwargs) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/bin/bcbio_nextgen.py", line 43, in main run_main(kwargs) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 42, in run_main fc_dir, run_info_yaml) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 86, in _run_toplevel for xs in pipeline(config, run_info_yaml, parallel, dirs, samples): File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 154, in variant2pipeline samples = run_parallel("postprocess_variants", samples) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items): File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 800, in call while self.dispatch_one_batch(iterator): File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch self._dispatch(tasks) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 566, in _dispatch job = ImmediateComputeBatch(batch) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 180, in init self.results = batch() File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in call return [func(*args, *kwargs) for func, args, kwargs in self.items] File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 50, in wrapper return apply(f, args, *kwargs) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 178, in postprocess_variants return variation.postprocess_variants(args) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/pipeline/variation.py", line 55, in postprocess_variants ann_vrn_file, vrn_stats = effects.add_to_vcf(data["vrn_file"], data) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/variation/effects.py", line 38, in add_to_vcf ann_vrn_file = run_vep(in_file, data) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/variation/effects.py", line 173, in run_vep do.run(cmd, "Ensembl variant effect predictor", data) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run _do_run(cmd, checks, log_stdout, env=env) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 102, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; unset PERL5LIB && export PATH=/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/bin:$PATH && /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/bin/vep --vcf -o stdout -i /mnt/projects/ngbhs/cirqseq/2017_bcbio/20170727_set2PDX_newell_steph/run3_bcbio_disambiguate/set2_disambiguatePDX/work/varscan/PDX948_2519_P4.vcf.gz --species homo_sapiens --no_stats --cache --offline --dir /mnt/projects/dlho/tancrc/bcbio_pipeline/genomes/Hsapiens/hg38/vep --symbol --numbers --biotype --total_length --canonical --gene_phenotype --ccds --uniprot --domains --regulatory --protein --tsl --appris --af --max_af --af_1kg --af_esp --af_exac --pubmed --variant_class --fasta /mnt/projects/dlho/tancrc/bcbio_pipeline/genomes/Hsapiens/hg38/seq/hg38.fa.gz --plugin LoF,human_ancestor_fa:false --sift b --polyphen b --hgvs --shift_hgvs 1 --merged | sed '/^#/! s/;;/;/g' | bgzip -c > /mnt/projects/ngbhs/cirqseq/2017_bcbio/20170727_set2PDX_newell_steph/run3_bcbio_disambiguate/set2_disambiguatePDX/work/bcbiotx/tmpuC0q9f/PDX948_2519_P4-vepeffects.vcf.gz

chapmanb commented 7 years ago

Thanks for testing this out and sorry about the problem. VEP 90.* changed the command line argument here and we fixed this in a very recent development (https://github.com/chapmanb/bcbio-nextgen/commit/b5b45f2e5ee38c74e57198dbd7353ecdaae3a27d). If you update with bcbio_nextgen.py upgrade -u development it should use the right commandline and work cleanly. We're planning a new release tomorrow so hopefully this will be smoother for you going forward. Thanks again for working through all the issues.

phu5ion commented 7 years ago

Thanks Brad, looking good now!