Closed amizeranschi closed 4 years ago
Hi @amizeranschi !
Sorry about the issue. It seems we have solved it recently in the development: https://github.com/bcbio/bcbio-nextgen/issues/3180
Could you please try with
python3 bcbio_nextgen_install.py \
/bcbio -u development \
--tooldir=/bcbio/tools \
--nodata \
--isolate
This
python bcbio_nextgen_install.py /bcbio --tooldir=/bcbio/tools --nodata --isolate
also works to install 1.2.3 stable.
Sergey
Yes, the development version made it through. Thanks for mentioning it. Would be useful to get a new stable release out, though, if the current one has that issue.
However, I ran into another problem now. After installing bcbio with --nodata
, I added a custom genome via bcbio_setup_genome.py
. This seemed to work fine, but the bcbio_nextgen/galaxy/tool-data
directory did not get created. To be more precise, the bcbio_nextgen/galaxy
directory only contains the file bcbio_system.yaml
.
This renders the custom genome unusable, as bcbio complains about missing loc files:
[2020-05-17T16:21Z] Using input YAML configuration: /export/home/ncit/external/a.mizeranschi/automated-VC-test-BRF/testingVC/config/testingVC.yaml
[2020-05-17T16:21Z] Checking sample YAML configuration: /export/home/ncit/external/a.mizeranschi/automated-VC-test-BRF/testingVC/config/testingVC.yaml
Running bcbio version: 1.2.3
global config: /export/home/ncit/external/a.mizeranschi/automated-VC-test-BRF/testingVC/work/bcbio_system.yaml
run info config: /export/home/ncit/external/a.mizeranschi/automated-VC-test-BRF/testingVC/config/testingVC.yaml
Traceback (most recent call last):
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/tools/bin/bcbio_nextgen.py", line 245, in <module>
main(**kwargs)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/tools/bin/bcbio_nextgen.py", line 46, in main
run_main(**kwargs)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 50, in run_main
fc_dir, run_info_yaml)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 128, in variant2pipeline
[x[0]["description"] for x in samples]]])
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
return run_multicore(fn, items, config, parallel=parallel)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 1029, in __call__
if self.dispatch_one_batch(iterator):
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 847, in dispatch_one_batch
self._dispatch(tasks)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 765, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 206, in apply_async
result = ImmediateResult(func)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 570, in __init__
self.results = batch()
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 253, in __call__
for func, args, kwargs in self.items]
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 253, in <listcomp>
for func, args, kwargs in self.items]
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/utils.py", line 55, in wrapper
return f(*args, **kwargs)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/distributed/multitasks.py", line 459, in organize_samples
return run_info.organize(*args)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/run_info.py", line 81, in organize
item = add_reference_resources(item, remote_retriever)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/run_info.py", line 177, in add_reference_resources
data["dirs"]["galaxy"], data)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/genome.py", line 233, in get_refs
galaxy_config, data)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/genome.py", line 180, in _get_ref_from_galaxy_loc
(genome_build, os.path.normpath(loc_file)))
ValueError: Did not find genome build sacCer3_BRF in bcbio installation: /export/home/ncit/external/a.mizeranschi/bcbio_nextgen/galaxy/tool-data/sam_fa_indices.loc
Glad it worked!
I think since you have not installed any data, then tool-data was not created.
Can you try installing with --genomes hg38 --aligners bwa
and then install the custom genome?
Or just create manually the loc files:
https://bcbio-nextgen.readthedocs.io/en/latest/contents/configuration.html#reference-genome-files
OK, I tried installing the hg38 genome. This crashed due to a failed download (http://www.cs.jhu.edu/~genomics/GeneSplicer/GeneSplicer.tar.gz). It looks like that URL isn't valid anymore and it caused an error with a GGD recipe (hg38 genesplicer 2004.04.03
).
--2020-05-18 08:18:21-- http://www.cs.jhu.edu/~genomics/GeneSplicer/GeneSplicer.tar.gz
Resolving www.cs.jhu.edu (www.cs.jhu.edu)... 128.220.13.76
Connecting to www.cs.jhu.edu (www.cs.jhu.edu)|128.220.13.76|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-05-18 08:18:21 ERROR 404: Not Found.
Upgrading bcbio
Detected 1.2.3 as latest version of bcbio-nextgen on bioconda.
bcbio version 1.2.3 is newer than the conda version 1.2.3, skipping upgrade from conda
Upgrading bcbio-nextgen to latest development version
Upgrade of bcbio-nextgen development code complete.
Upgrading third party tools to latest versions
Reading packages from /export/home/ncit/external/a.mizeranschi/bcbio_nextgen/tmpbcbio-install/cloudbiolinux/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml
Creating conda environment: python3
Creating conda environment: samtools0
Creating conda environment: dv
Creating conda environment: python2
Creating conda environment: r36
Creating conda environment: htslib1.10
Checking for problematic or migrated packages in default environment
Initalling initial set of packages for default environment with mamba
# Installing into conda environment default: age-metasv, arriba, ataqv, bamtools=2.4.0, bamutil, bbmap, bcbio-prioritize, bcbio-variation, bcbio-variation-recall, bcftools, bedops, bedtools=2.27.1, bio-vcf, biobambam, bowtie, bowtie2, break-point-inspector, bwa, bwakit, cage, cancerit-allelecount, chipseq-greylist, cnvkit, coincbc, cramtools, cufflinks, cyvcf2, deeptools, delly, duphold, ensembl-vep=99.*, express, extract-sv-reads, fastp, fastqc>=0.11.8=1, fgbio, freebayes=1.1.0.46, gatk, gatk4, geneimpacts, genesplicer, gffcompare, goleft, grabix, gridss, gsort, gvcfgenotyper, h5py, hmftools-amber, hmftools-cobalt, hmftools-purple, hmmlearn, hts-nim-tools, htslib, impute2, kallisto>=0.43.1, kraken, ldc>=1.13.0, lofreq, macs2, maxentscan, mbuffer, minimap2, mintmap, mirdeep2=2.0.0.7, mirtop, moreutils, multiqc, multiqc-bcbio, ngs-disambiguate, novoalign, octopus>=0.5.1b, oncofuse, optitype>=1.3.4, parallel, pbgzip, peddy, perl-sanger-cgp-battenberg, picard, pindel, pizzly, pyloh, pysam>=0.14.0, pythonpy, qsignature, qualimap, rapmap, razers3=3.5.0, rtg-tools, sailfish, salmon, sambamba, samblaster, samtools, scalpel, seq2c<2016, seqbuster, seqcluster, seqtk, sickle-trim, simple_sv_annotation, singlecell-barcodes, snap-aligner=1.0dev.97, snpeff=4.3.1t, solvebio, spades, staden_io_lib, star=2.6.1d, stringtie, subread, survivor, tdrmapper, tophat-recondition, trim-galore=0.6.2, ucsc-bedgraphtobigwig, ucsc-bedtobigbed, ucsc-bigbedinfo, ucsc-bigbedsummary, ucsc-bigbedtobed, ucsc-bigwiginfo, ucsc-bigwigsummary, ucsc-bigwigtobedgraph, ucsc-bigwigtowig, ucsc-fatotwobit, ucsc-gtftogenepred, ucsc-liftover, ucsc-wigtobigwig, umis, vardict, vardict-java, variantbam, varscan, vcfanno, vcflib, verifybamid2, viennarna, vqsr_cnn, vt, wham, anaconda-client, awscli, bzip2, ncurses, nodejs, p7zip, readline, s3gof3r, xz, perl-app-cpanminus, perl-archive-extract, perl-archive-zip, perl-bio-db-sam, perl-cgi, perl-dbi, perl-encode-locale, perl-file-fetch, perl-file-sharedir, perl-file-sharedir-install, perl-ipc-system-simple, perl-lwp-protocol-https, perl-lwp-simple, perl-statistics-descriptive, perl-time-hires, perl-vcftools-vcf, bioconductor-annotate, bioconductor-apeglm, bioconductor-biocgenerics, bioconductor-biocinstaller, bioconductor-biocstyle, bioconductor-biostrings, bioconductor-biovizbase, bioconductor-bsgenome.hsapiens.ucsc.hg19, bioconductor-bsgenome.hsapiens.ucsc.hg38, bioconductor-bubbletree, bioconductor-cn.mops, bioconductor-copynumber, bioconductor-degreport, bioconductor-deseq2, bioconductor-dexseq, bioconductor-dnacopy, bioconductor-genomeinfodbdata, bioconductor-genomicranges, bioconductor-iranges, bioconductor-limma, bioconductor-rtracklayer, bioconductor-snpchip, bioconductor-titancna, bioconductor-vsn>=3.50.0, r-base, r-basejump=0.7.2, r-bcbiornaseq>=0.2.7, r-cghflasso, r-chbutils, r-devtools, r-dplyr, r-dt, r-ggdendro, r-ggplot2, r-ggrepel>=0.7, r-gplots, r-gsalib, r-knitr, r-pheatmap, r-plyr, r-pscbs, r-reshape, r-rmarkdown, r-rsqlite, r-sleuth, r-snow, r-stringi, r-viridis>=0.5, r-wasabi, r=3.5.1, xorg-libxt
# Installing into conda environment dv: deepvariant
# Installing into conda environment htslib1.10: mosdepth
# Installing into conda environment python2: bismark=0.22.1, cpat, cutadapt=1.16, dkfz-bias-filter, gemini, gvcf-regions, hap.py, hisat2, htseq=0.9.1, lumpy-sv, manta, metasv, mirge, phylowgs, platypus-variant, sentieon, smcounter2, smoove, strelka, svtools, svtyper, theta2, tophat, vawk, vcf2db
# Installing into conda environment python3: atropos, crossmap
# Installing into conda environment r36: bioconductor-purecn>=1.16.0
# Installing into conda environment samtools0: ericscript
Creating manifest of installed packages in /export/home/ncit/external/a.mizeranschi/bcbio_nextgen/manifest
Third party tools upgrade complete.
Upgrading bcbio-nextgen data files
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}], 'genome_indexes': ['bwa', 'bowtie2', 'hisat2', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full
Running GGD recipe: hg38 seq 1000g-20150219_1
Running GGD recipe: hg38 bwa 1000g-20150219
Moving on to next genome prep method after trying ggd
GGD recipe not available for hg38 bowtie2
Downloading genome from s3: hg38 bowtie2
Moving on to next genome prep method after trying s3
No pre-computed indices for hg38 bowtie2
Preparing genome hg38 with index bowtie2
Running GGD recipe: hg38 hisat2 12-07-2015
Moving on to next genome prep method after trying ggd
GGD recipe not available for hg38 rtg
Downloading genome from s3: hg38 rtg
Moving on to next genome prep method after trying s3
No pre-computed indices for hg38 rtg
Preparing genome hg38 with index rtg
Running GGD recipe: hg38 ccds r20
Running GGD recipe: hg38 capture_regions 20161202
Running GGD recipe: hg38 coverage 2018-10-16
Running GGD recipe: hg38 prioritize 20181227
Running GGD recipe: hg38 dbsnp 153-20180725
Running GGD recipe: hg38 hapmap_snps 20160105
Running GGD recipe: hg38 1000g_omni_snps 20160105
Running GGD recipe: hg38 ACMG56_genes 20160726
Running GGD recipe: hg38 1000g_snps 20160105
Running GGD recipe: hg38 mills_indels 20160105
Running GGD recipe: hg38 1000g_indels 2.8_hg38_20150522
Running GGD recipe: hg38 clinvar 20190513
Running GGD recipe: hg38 qsignature 20160526
Running GGD recipe: hg38 genesplicer 2004.04.03
Traceback (most recent call last):
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/bin/bcbio_nextgen.py", line 228, in <module>
install.upgrade_bcbio(kwargs["args"])
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 107, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 359, in upgrade_bcbio_data
args.cores, ["ggd", "s3", "raw"])
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local
_prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 875, in _install_with_ggd
ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
recipe["recipe"]["full"]["recipe_type"], system_install)
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
subprocess.check_output(["bash", run_file])
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bash', '/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/genomes/Hsapiens/hg38/txtmp/ggd-run.sh']' returned non-zero exit status 8.
Checking required dependencies
Installing isolated base python installation
Installing mamba
Installing conda-build
Installing bcbio-nextgen
Installing data and third party dependencies
Traceback (most recent call last):
File "bcbio_nextgen_install.py", line 290, in <module>
main(parser.parse_args(), sys.argv[1:])
File "bcbio_nextgen_install.py", line 52, in main
subprocess.check_call([bcbio, "upgrade"] + _clean_args(sys_argv, args))
File "/usr/lib64/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/anaconda/bin/bcbio_nextgen.py', 'upgrade', '-u', 'development', '--tooldir=/export/home/ncit/external/a.mizeranschi/bcbio_nextgen/tools', '--genomes', 'hg38', '--datatarget', 'variation', '--datatarget', 'rnaseq', '--datatarget', 'smallrna', '--aligners', 'bwa', '--aligners', 'bowtie2', '--aligners', 'hisat2', '--isolate', '--cores', '2', '--data']' returned non-zero exit status 1.
I also found an earlier thread (https://github.com/bcbio/bcbio-nextgen/issues/3165) where @chapmanb suggested that these errors can sometimes happen intermittently and retrying the install/upgrade can get things running.
I retried this a couple of times and each time it crashed at that particular step. I'm guessing the URL http://www.cs.jhu.edu/~genomics/GeneSplicer/GeneSplicer.tar.gz isn't working anymore and hg38 is uninstallable as a result.
Thanks, genesplicer is an easy fix - they moved to FTP server: https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg38/genesplicer.yaml Please try again. S.
Thanks for the fix. That got things moving forward, but the SnpEffect database for hg38 was taking forever to download (estimated 15 hours for 1.5 GB), so I canceled it. I don't actually need the hg38 data, anyway.
Instead, I installed the sacCer3 yeast genome, which I remembered was on the order of a couple hundred MB for all the data. Much more manageable, when its only purpose is to get the tool-data
directory created.
Things worked fine and I could then install and use the custom genome. Thanks a lot for your help!
I'm trying to install the latest stable version and it's failing, something to do with a badly formatted YAML file. Here's what I'm running:
and this is the result: