bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
984 stars 353 forks source link

Installation error: Stalled and incomplete install. Inability to upgrade. #3462

Closed AlexGreiner closed 3 years ago

AlexGreiner commented 3 years ago

Version info Version: 1.2.7 (or latest version per bcbio_nextgen_install.py) OS name:/version: CentOS Linux release 7.8.2003 (Core) OS environment: HPC cluster

To Reproduce wget command ran on Apr 12 2021 wget https://raw.githubusercontent.com/bcbio/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py

Install command ran on Apr 12 2021 python3 bcbio_nextgen_install.py ./bcbio --tooldir=./bcbio_tools --minimize-disk --nodata --distribution centos --upgrade stable --cores 2

Observed behavior

Similar issue as in a comment by @yavit1 in #3456.

Install script initializes successfully and runs until the below lines. Installation then stalls at this point for >24 hours whether installation command is run as a submitted job or as a terminal command on the HPC.

Checking for problematic or migrated packages in default environment

Initalling [sic] initial set of packages for default environment with mamba

Installing into conda environment default: age-metasv, arriba, bamtools=2.4.0, bamutil, bbmap, bcbio-prioritize, bcbio-variation, bcbio-variation-recall, bcftools, bedops, bedtools, bio-vcf, biobambam, bowtie, bowtie2, break-point-inspector, bwa, cage, cancerit-allelecount, chipseq-greylist, cnvkit, coincbc, cramtools, cufflinks, cyvcf2, deeptools, delly, duphold, ensembl-vep=100.*, express, extract-sv-reads, fastp, fastqc>=0.11.8=1, fgbio, freebayes, gatk, gatk4, geneimpacts, genesplicer, gffcompare, goleft, grabix, gridss, gsort, gvcfgenotyper, h5py, hisat2, hmftools-amber, hmftools-cobalt, hmftools-purple, hmmlearn, hts-nim-tools, htslib, impute2, kallisto>=0.43.1, kraken, ldc>=1.13.0, lofreq, macs2, maxentscan, mbuffer, minimap2, mintmap, mirdeep2=2.0.0.7, mirtop, moreutils, multiqc, multiqc-bcbio, ngs-disambiguate, novoalign, octopus>=0.5.1b, oncofuse, optitype>=1.3.4, pandoc=2.9.2, parallel, pbgzip, peddy, perl-sanger-cgp-battenberg, picard, pindel, pizzly, pyloh, pysam>=0.14.0, pythonpy, qsignature, qualimap, rapmap, razers3=3.5.0, rtg-tools, sailfish, salmon, sambamba, samblaster, samtools=1.9, scalpel, seq2c<2016, seqbuster, seqcluster, seqtk, sickle-trim, simple_sv_annotation, singlecell-barcodes, snap-aligner=1.0dev.97, snpeff=4.3.1t, solvebio, spades, staden_io_lib, star=2.6.1d, stringtie, subread, survivor, tdrmapper, tophat-recondition, trim-galore, ucsc-bedgraphtobigwig, ucsc-bedtobigbed, ucsc-bigbedinfo, ucsc-bigbedsummary, ucsc-bigbedtobed, ucsc-bigwiginfo, ucsc-bigwigsummary, ucsc-bigwigtobedgraph, ucsc-bigwigtowig, ucsc-fatotwobit, ucsc-gtftogenepred, ucsc-liftover, ucsc-wigtobigwig, umis, vardict-java, vardict<=2015, variantbam, varscan, vcfanno, vcflib, verifybamid2, viennarna, vqsr_cnn, vt, wham, anaconda-client, awscli, bzip2, ncurses, nodejs, p7zip, readline, s3gof3r, xz, perl-app-cpanminus, perl-archive-extract, perl-archive-zip, perl-bio-db-sam, perl-cgi, perl-dbi, perl-encode-locale, perl-file-fetch, perl-file-sharedir, perl-file-sharedir-install, perl-ipc-system-simple, perl-lwp-protocol-https, perl-lwp-simple, perl-statistics-descriptive, perl-time-hires, perl-vcftools-vcf, bioconductor-annotate, bioconductor-apeglm, bioconductor-biocgenerics, bioconductor-biocinstaller, bioconductor-biocstyle, bioconductor-biostrings, bioconductor-biovizbase, bioconductor-bsgenome.hsapiens.ucsc.hg19, bioconductor-bsgenome.hsapiens.ucsc.hg38, bioconductor-bubbletree, bioconductor-cn.mops, bioconductor-copynumber, bioconductor-degreport, bioconductor-deseq2, bioconductor-dexseq, bioconductor-dnacopy, bioconductor-genomeinfodbdata, bioconductor-genomicranges, bioconductor-iranges, bioconductor-limma, bioconductor-rtracklayer, bioconductor-snpchip, bioconductor-titancna, bioconductor-vsn>=3.50.0, r-base, r-basejump=0.7.2, r-bcbiornaseq>=0.2.7, r-cghflasso, r-chbutils, r-devtools, r-dplyr, r-dt, r-ggdendro, r-ggplot2, r-ggrepel>=0.7, r-gplots, r-gsalib, r-janitor, r-knitr, r-pheatmap, r-plyr, r-pscbs, r-reshape, r-rmarkdown, r-rsqlite, r-sleuth, r-snow, r-stringi, r-viridis>=0.5, r-wasabi, r=3.5.1, xorg-libxt

Confirmation of installation fails (--isolate was not used): $which bcbio_nextgen.py /usr/bin/which: no bcbio_nextgen.py in (...long path)

A direct call to bcbio_nextgen.py in ./bcbio/anaconda/bin works: ./bcbio/anaconda/bin/bcbio_nextgen.py --version 1.2.7

However, attempted upgrade/installation of genomes and tools using this path fails: ./bcbio/anaconda/bin/bcbio_nextgen.py upgrade -u stable --genomes hg38 --aligners bwa Traceback (most recent call last): File "./bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in install.upgrade_bcbio(kwargs["args"]) File "./bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 107, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "./bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 359, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "./bcbio/anaconda/bin/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 349, in install_data_local os.environ["PATH"] = "%s/bin:%s" % (os.path.join(system_installdir), os.environ["PATH"]) File "./bcbio/anaconda/lib/python3.7/posixpath.py", line 80, in join a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType

Expected behavior Expected behavior: a clean install of bcbio_nextgen capable of downloading genomes and tools

Log files Output files included directly in issue. Can provided .txt if necessary.

Additional context

naumenko-sa commented 3 years ago

Hi @AlexGreiner !

Thanks for reporting and sorry about the issue! How much RAM are you allocating to the installer job? Can you try to increase it to 20G? I've had a successful conda solve yesterday with 20G.

Sergey

AlexGreiner commented 3 years ago

Hi @naumenko-sa,

Thanks for the quick response! I had a successful install overnight and did not run into the issue you did in #3459.

Code:

python3 bcbio_nextgen_install.py ./bcbio --tooldir=./bcbio_tools --nodata --distribution centos --cores 12

Job-specific info:

Cores: 14 (SGE -pe smp 14) Memory: 40G (SGE -l mf=40G) Version info: Version: 1.2.7 (or latest version per bcbio_nextgen_install.py) OS name:/version: CentOS Linux release 7.8.2003 (Core) Computer environment: HPC cluster with SGE scheduler

Run info:

Wallclock time: 15h52m CPU time: 15h7m Max vmem: 10.58G Mem used: 305T IO: 206G

naumenko-sa commented 3 years ago

Thanks for confirming!