bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

installation issues #3318

Closed DiyaVaka closed 4 years ago

DiyaVaka commented 4 years ago

Hi,

I have a clean install of bcbio and everything went fine with installation and no errors. While I tried to run sample it gave the following errors

[ec2-user@isamples]$ /newvolume/bcbio/anaconda/bin/bcbio_nextgen.py ./sample.yaml -n 10
Running bcbio version: 1.2.3
global config: /newvolume/samples/bcbio_system.yaml
run info config: /newvolume/samples/sample.yaml
[2020-07-28T19:38Z] System YAML configuration: /newvolume/bcbio/galaxy/bcbio_system.yaml.
[2020-07-28T19:38Z] Locale set to C.utf8.
[2020-07-28T19:38Z] Resource requests: bwa, gatk, sambamba, samtools; memory: 3.00, 3.00, 3.00, 3.00; cores: 16, 16, 16, 16
[2020-07-28T19:38Z] Configuring 1 jobs to run, using 10 cores each with 30.1g of memory reserved for each job
[2020-07-28T19:38Z] Timing: organize samples
[2020-07-28T19:38Z] multiprocessing: organize_samples
[2020-07-28T19:38Z] Using input YAML configuration: /newvolume/samples/sample.yaml
[2020-07-28T19:38Z] Checking sample YAML configuration: /newvolume/samples/sample.yaml
[2020-07-28T19:38Z] Retreiving program versions from /newvolume/bcbio/manifest/python-packages.yaml.
[2020-07-28T19:38Z] Retreiving program versions from /newvolume/bcbio/manifest/r-packages.yaml.
[2020-07-28T19:38Z] Testing minimum versions of installed programs
Traceback (most recent call last):
  File "/newvolume/bcbio/anaconda/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/newvolume/bcbio/anaconda/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 128, in variant2pipeline
    [x[0]["description"] for x in samples]]])
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 1029, in __call__
    if self.dispatch_one_batch(iterator):
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 847, in dispatch_one_batch
    self._dispatch(tasks)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 765, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 253, in __call__
    for func, args, kwargs in self.items]
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 253, in <listcomp>
    for func, args, kwargs in self.items]
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/utils.py", line 55, in wrapper
    return f(*args, **kwargs)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multitasks.py", line 455, in organize_samples
    return run_info.organize(*args)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/run_info.py", line 94, in organize
    out = _add_provenance(out, dirs, config, not is_cwl)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/run_info.py", line 107, in _add_provenance
    versioncheck.testall(items)
  File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/provenance/versioncheck.py", line 97, in testall
    "\n".join(msgs))
OSError: Program problems found. You can upgrade dependencies with:
bcbio_nextgen.py upgrade -u skip --tooldir=/usr/local

Installed version of samtools sort does not have support for multithreading (-@ option) required to support bwa piped alignment and BAM merging. Please upgrade to the latest version from http://samtools.sourceforge.net/

Then i tried to do

/newvolume/bcbio/anaconda/bin/bcbio_nextgen.py upgrade -u skip --tooldir=/newvolume/bcbio/tools/
Upgrading bcbio
Upgrading third party tools to latest versions
--2020-07-28 19:26:56--  https://github.com/chapmanb/cloudbiolinux/archive/master.tar.gz
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/master [following]
--2020-07-28 19:26:56--  https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/master
Resolving codeload.github.com (codeload.github.com)... 140.82.112.10
Connecting to codeload.github.com (codeload.github.com)|140.82.112.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘STDOUT’

     0K ........ ........ ........ ........ ........ ........ 16.8M
  3072K ........ ........ ........ .......                    15.2M=0.3s

2020-07-28 19:26:57 (16.1 MB/s) - written to stdout [5189923]

Reading packages from /newvolume/tmpbcbio-install/cloudbiolinux/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml
Checking for problematic or migrated packages in default environment
# Installing into conda environment default: age-metasv, arriba, bamtools=2.4.0, bamutil, bbmap, bcbio-prioritize, bcbio-variation, bcbio-variation-recall, bcftools, bedops, bedtools=2.27.1, bio-vcf, biobambam, bowtie, bowtie2, break-point-inspector, bwa, bwakit, cage, cancerit-allelecount, chipseq-greylist, cnvkit, coincbc, cramtools, cufflinks, cyvcf2, deeptools, delly, duphold, ensembl-vep=100.*, express, extract-sv-reads, fastp, fastqc>=0.11.8=1, fgbio, freebayes=1.1.0.46, gatk, gatk4, geneimpacts, genesplicer, gffcompare, goleft, grabix, gridss, gsort, gvcfgenotyper, h5py, hmftools-amber, hmftools-cobalt, hmftools-purple, hmmlearn, hts-nim-tools, htslib, impute2, kallisto>=0.43.1, kraken, ldc>=1.13.0, lofreq, macs2, maxentscan, mbuffer, minimap2, mintmap, mirdeep2=2.0.0.7, mirtop, moreutils, multiqc, multiqc-bcbio, ngs-disambiguate, novoalign, octopus>=0.5.1b, oncofuse, optitype>=1.3.4, parallel, pbgzip, peddy, perl-sanger-cgp-battenberg, picard, pindel, pizzly, pyloh, pysam>=0.14.0, pythonpy, qsignature, qualimap, rapmap, razers3=3.5.0, rtg-tools, sailfish, salmon, sambamba, samblaster, samtools, scalpel, seq2c<2016, seqbuster, seqcluster, seqtk, sickle-trim, simple_sv_annotation, singlecell-barcodes, snap-aligner=1.0dev.97, snpeff=4.3.1t, solvebio, spades, staden_io_lib, star=2.6.1d, stringtie, subread, survivor, tdrmapper, tophat-recondition, trim-galore, ucsc-bedgraphtobigwig, ucsc-bedtobigbed, ucsc-bigbedinfo, ucsc-bigbedsummary, ucsc-bigbedtobed, ucsc-bigwiginfo, ucsc-bigwigsummary, ucsc-bigwigtobedgraph, ucsc-bigwigtowig, ucsc-fatotwobit, ucsc-gtftogenepred, ucsc-liftover, ucsc-wigtobigwig, umis, vardict, vardict-java, variantbam, varscan, vcfanno, vcflib, verifybamid2, viennarna, vqsr_cnn, vt, wham, anaconda-client, awscli, bzip2, ncurses, nodejs, p7zip, readline, s3gof3r, xz, perl-app-cpanminus, perl-archive-extract, perl-archive-zip, perl-bio-db-sam, perl-cgi, perl-dbi, perl-encode-locale, perl-file-fetch, perl-file-sharedir, perl-file-sharedir-install, perl-ipc-system-simple, perl-lwp-protocol-https, perl-lwp-simple, perl-statistics-descriptive, perl-time-hires, perl-vcftools-vcf, bioconductor-annotate, bioconductor-apeglm, bioconductor-biocgenerics, bioconductor-biocinstaller, bioconductor-biocstyle, bioconductor-biostrings, bioconductor-biovizbase, bioconductor-bsgenome.hsapiens.ucsc.hg19, bioconductor-bsgenome.hsapiens.ucsc.hg38, bioconductor-bubbletree, bioconductor-cn.mops, bioconductor-copynumber, bioconductor-degreport, bioconductor-deseq2, bioconductor-dexseq, bioconductor-dnacopy, bioconductor-genomeinfodbdata, bioconductor-genomicranges, bioconductor-iranges, bioconductor-limma, bioconductor-rtracklayer, bioconductor-snpchip, bioconductor-titancna, bioconductor-vsn>=3.50.0, r-base, r-basejump=0.7.2, r-bcbiornaseq>=0.2.7, r-cghflasso, r-chbutils, r-devtools, r-dplyr, r-dt, r-ggdendro, r-ggplot2, r-ggrepel>=0.7, r-gplots, r-gsalib, r-knitr, r-pheatmap, r-plyr, r-pscbs, r-reshape, r-rmarkdown, r-rsqlite, r-sleuth, r-snow, r-stringi, r-viridis>=0.5, r-wasabi, r=3.5.1, xorg-libxt

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

# Installing into conda environment dv: deepvariant

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

# Installing into conda environment htslib1.10: mosdepth

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

# Installing into conda environment python2: bismark, cpat, cutadapt=1.16, dkfz-bias-filter, gemini, gvcf-regions, hap.py, hisat2, htseq=0.9.1, lumpy-sv, manta, metasv, mirge, phylowgs, platypus-variant, sentieon, smcounter2, smoove, strelka, svtools, svtyper, theta2, tophat, vawk, vcf2db

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

# Installing into conda environment python3: atropos, crossmap

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

# Installing into conda environment r36: ataqv, bioconductor-purecn>=1.16.0

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

# Installing into conda environment samtools0: ericscript

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Creating manifest of installed packages in /newvolume/bcbio/manifest
Third party tools upgrade complete.
Upgrade completed successfully.

but still getting the same error message.

I tried to do conda install samtools tools but it says everything is up to date

roryk commented 4 years ago

I think you probably have a samtools in your PATH that is superseding the one that bcbio installs. If you do which samtools, what comes up?

DiyaVaka commented 4 years ago

no it should not as this is new ecs optimized instance and has only bcbio installed on it and it does gives bcbio installed samtools

[ec2-user@ ~]$ which samtools /newvolume/bcbio/anaconda/bin/samtools

roryk commented 4 years ago

I'm stumped. What does /newvolume/bcbio/anaconda/bin/samtools --version report?

DiyaVaka commented 4 years ago

hmm, i did that

(base) [ec2-user@ ~]$ which samtools /newvolume/bcbio/anaconda/bin/samtools (base) [ec2-user@i~]$ /newvolume/bcbio/anaconda/bin/samtools --version /newvolume/bcbio/anaconda/bin/samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory (base) [ec2-user@ ~]$ export LD_LIBRARY_PATH="/newvolume/bcbio/anaconda/lib" (base) [ec2-user@i ~]$ /newvolume/bcbio/anaconda/bin/samtools --version /newvolume/bcbio/anaconda/bin/samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

Even after adding the libarary path it does not recognize it and i do see the file in the lib folder.

DiyaVaka commented 4 years ago

the library i have is (base) [ec2-user@ ~]$ ls -al /newvolume/bcbio/anaconda/lib/libcrypto.so.1.1 -rwxrwxr-x 1 ec2-user ec2-user 3278880 Jul 28 00:18 /newvolume/bcbio/anaconda/lib/libcrypto.so.1.1

i guess its expecting to have libcrypto.so.1.0.0:

roryk commented 4 years ago

Ok! I think bioconda is having some trouble with these libraries right now. https://github.com/bioconda/bioconda-recipes/issues/12100 has someone else with this issue.

Can you try:

conda install openssl=1.0 to roll back the shared library?

DiyaVaka commented 4 years ago

Hi,

Thank you. the above solution worked. now i have another road block

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/newvolume/bcbio/anaconda/bin/bcbio_nextgen.py", line 245, in main(kwargs) File "/newvolume/bcbio/anaconda/bin/bcbio_nextgen.py", line 46, in main run_main(kwargs) File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 50, in run_main fc_dir, run_info_yaml) File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel for xs in pipeline(config, run_info_yaml, parallel, dirs, samples): File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 154, in variant2pipeline samples = genotype.parallel_variantcall_region(samples, run_parallel) File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/variation/genotype.py", line 208, in parallel_variantcall_region "vrn_file", ["region", "sam_ref", "config"])) File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/split.py", line 35, in grouped_parallel_split_combine final_output = parallel_fn(parallel_name, split_args) File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items): File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 1042, in call self.retrieve() File "/newvolume/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 921, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/newvolume/bcbio/anaconda/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value NameError: name 'bcbio_bin' is not defined

DiyaVaka commented 4 years ago

I added this line to the above line in utils.py and it passed, but still have error from pool.py bcbio_bin = get_bcbio_bin()

DiyaVaka commented 4 years ago

hi,

smoove is complaining and when i try to install this is what it says

conda install -c bioconda smoove Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: | Found conflicts! Looking for incompatible packages. This can take several minutes. Press CTRL-C to abort. failed

UnsatisfiableError: The following specifications were found to be incompatible with the existing python installation in your environment:

Specifications:

Your python: python=3.6

If python is on the left-most side of the chain, that's the version you've asked for. When python appears to the right, that indicates that the thing on the left is somehow not available for the python version you are constrained to. Note that conda will not change your python version to a different minor version unless you explicitly specify that.

naumenko-sa commented 4 years ago

Hi @DiyaVaka ! What is the original smoove error? Why would you need to reinstall it?

Can you show what is in your PATH? echo $PATH

Sergey

DiyaVaka commented 4 years ago

/newvolume/bcbio/anaconda/bin:/newvolume/bcbio/anaconda/condabin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin

DiyaVaka commented 4 years ago

the original error with smoove was it could not find it and there is not one available

naumenko-sa commented 4 years ago

Hi!

Your installation seems to be broken. What was your installation command? A successful installation should create two bin directories:

You need to put both dirs in your PATH.

smoove goes into tools:

which smoove
/bcbio/tools/bin/smoove

Sergey

mjsteinbaugh commented 2 years ago

I see to be hitting a similar issue with 1.2.9 clean installed on a new EC2 instance running Ubuntu 20:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/koopa/opt/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/opt/koopa/opt/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 242, in rnaseqpipeline
    samples = rnaseq_prep_samples(config, run_info_yaml, parallel, dirs, samples)
  File "/opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 467, in rnaseq_prep_samples
    [x[0]["description"] for x in samples]]])
  File "/opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1061, in __call__
    self.retrieve()
  File "/opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 940, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
OSError: Program problems found. You can upgrade dependencies with:
bcbio_nextgen.py upgrade -u skip --tooldir=/usr/local

Installed version of samtools sort does not have support for multithreading (-@ option) required to support bwa piped alignment and BAM merging. Please upgrade to the latest version from http://samtools.sourceforge.net/
mjsteinbaugh commented 2 years ago

Looks like there's a problem with samtools dependency on libcrypto:

/opt/koopa/opt/bcbio-nextgen/install/anaconda/bin/samtools
samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
mjsteinbaugh commented 2 years ago

Here's the current list of environments:

/opt/koopa/opt/bcbio-nextgen/install/anaconda/condabin/conda env list
# conda environments:
#
base                  *  /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda
bwakit                   /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/bwakit
dv                       /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/dv
htslib1.10               /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/htslib1.10
htslib1.11               /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/htslib1.11
htslib1.12               /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/htslib1.12
htslib1.12_py3.9         /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/htslib1.12_py3.9
htslib1.9                /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/htslib1.9
java                     /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/java
python2                  /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/python2
python3.6                /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/python3.6
r35                      /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/r35
rbcbiornaseq             /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/rbcbiornaseq
samtools0                /opt/koopa/app/bcbio-nextgen/1.2.9/install/anaconda/envs/samtools0

@naumenko-sa Does CloudBioLinux need an update? I can work on debugging this.

These seems related:

amizeranschi commented 2 years ago

@mjsteinbaugh

Which samtools version did you end up having in your main bcbio env? I've been seeing this problem recently as well: https://github.com/bcbio/bcbio-nextgen/issues/3557

mjsteinbaugh commented 2 years ago

@amizeranschi I'm working on debugging this! Will let you know

Posting in your other thread. Let's keep this closed.

naumenko-sa commented 2 years ago

3557