bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
988 stars 354 forks source link

Unable to install genomes #3326

Closed gis-nlsim closed 3 years ago

gis-nlsim commented 4 years ago

Greetings,

I've been attempting to install bcbio quite a few times but it always end up failing when trying to install the genome files. The below is the error I keep getting.

I'm wondering if bcbio will work if I take an existing genomes directory containing the genomes, and modify the .loc files in the galaxy sub-directory? Thank you.

The basic installation works: python ./bcbio_nextgen_install.py /mnt/projects/XXX/wgs/tools/bcbio/1.2.3 --tooldir=/mnt/projects/XXX/wgs/tools/bcbio/1.2.3/tools --aligners bwa

But the error appears when I try to install the genomes: bcbio_nextgen.py upgrade -u skip --genomes GRCh37 --genomes hg38 --genomes mm10

2020-08-18 08:25:19 (246 KB/s) - ‘variation/dbsnp-153-orig.vcf.gz.tbi’ saved [2998587/2998587]

Writing to . Could not read: https://gist.githubusercontent.com/matthdsm/f833aedd2d67e28013ff1d171c70f4ee/raw/442a45ed3ddc6e85c66c5e58e0fa78e16a0821c8/refseq2ucsc.tsv [E::bcf_hdr_read] Input is not detected as bcf or vcf format Could not read VCF/BCF headers from - Cleaning tbx_index_build failed: variation/dbsnp-153.vcf.gz Traceback (most recent call last): File "/mnt/projects/XXX/wgs/tools/bcbio/1.2.3/anaconda/bin/bcbio_nextgen.py", line 228, in install.upgrade_bcbio(kwargs["args"]) File "/mnt/projects/XXX/wgs/tools/bcbio/1.2.3/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 107, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "/mnt/projects/XXX/wgs/tools/bcbio/1.2.3/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 377, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "/home/XXX/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir) File "/home/XXX/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes retrieve_fn(env, manager, gid, idx) File "/home/XXX/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 875, in _install_with_ggd ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid) File "/home/XXX/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe recipe["recipe"]["full"]["recipe_type"], system_install) File "/home/XXX/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe subprocess.check_output(["bash", run_file]) File "/mnt/projects/XXX/wgs/tools/bcbio/1.2.3/anaconda/lib/python3.6/subprocess.py", line 356, in check_output **kwargs).stdout File "/mnt/projects/XXX/wgs/tools/bcbio/1.2.3/anaconda/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['bash', '/mnt/projects/XXX/wgs/tools/bcbio/1.2.3/genomes/Hsapiens/hg38/txtmp/ggd-run.sh']' returned non-zero exit status 1. [XXX@n111 ~]$ bcbio_nextgen.py upgrade -u skip --genomes GRCh37 --genomes hg38 --genomes mm10

naumenko-sa commented 4 years ago

Hi @gis-nlsim !

Yes, you may just symlink /oldbcbio/genomes -> /newbcbio/genomes /odbcbio/galaxy/tool-data /newbcbio/galaxy/tool-data instead of reinstalling data.

Is it https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg38/dbsnp.yaml recipe that fails? We need to fix it then. Can you show the script in /mnt/projects/XXX/wgs/tools/bcbio/1.2.3/genomes/Hsapiens/hg38/txtmp/ggd-run.sh

Sergey

gis-nlsim commented 4 years ago

Thank you for your quick reply, as requested, the ggd-run.sh script is as follows:

[XXX@n111 txtmp]$ cat ggd-run.sh
#!/bin/bash
set -eu -o pipefail
export PATH=/mnt/projects/XXX/wgs/tools/bcbio/1.2.3/tools/bin:$PATH
build=153
version=GCF_000001405.38
url=http://ftp.ncbi.nih.gov/snp/archive/b$build/VCF/$version.gz
remap_url=https://gist.githubusercontent.com/matthdsm/f833aedd2d67e28013ff1d171c70f4ee/raw/442a45ed3ddc6e85c66c5e58e0fa78e16a0821c8/refseq2ucsc.tsv
ref=../seq/hg38.fa
mkdir -p variation
wget -c -O variation/dbsnp-$build-orig.vcf.gz $url
wget -c -O variation/dbsnp-$build-orig.vcf.gz.tbi $url.tbi
[[ -f variation/dbsnp-$build.vcf.gz ]] || bcftools annotate -Ou --rename-chrs $remap_url variation/dbsnp-$build-orig.vcf.gz |\
bcftools sort -m 1G -Oz -T . -o variation/dbsnp-$build.vcf.gz && \
tabix -f -p vcf -C variation/dbsnp-$build.vcf.gz
tabix -f -p vcf variation/dbsnp-$build.vcf.gz

Regards, Ngak Leng

naumenko-sa commented 4 years ago

Hi @gis-nlsim !

Sorry about the delay. The files look available:

Maybe it was a temporary Github glitch?

Were you able to run it the recipe? What happens if you run ggd-run.sh?

Sergey

gis-nlsim commented 3 years ago

install.error.txt Sorry, but I have yet to successfully install bcbio. Attached is the entire output from the bcbio installation process. Seek your assistance in this. I have tried installing on 2 different servers (from 2 different institutions), both without success. So I don't think it's an issue with the servers on my side. Thank you.

naumenko-sa commented 3 years ago

Hi @gis-nlsim!

Sorry about the continuing issues! I just did two fresh installations of bcbio development instances successfully.

From your log:

Traceback (most recent call last):
  File "bcbio_nextgen_install.py", line 290, in <module>
    main(parser.parse_args(), sys.argv[1:])
  File "bcbio_nextgen_install.py", line 51, in main
    subprocess.check_call([bcbio, "upgrade"] + _clean_args(sys_argv, args))
  File "/home/projects/13001264/tools/**bcbio/v1.1.9**/anaconda/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)

Is it possible that you are mixing two bcbio installations? The old bcbio should not be in the PATH when installing the new one!

Sergey

gis-nlsim commented 3 years ago

Yes I have an older version that we’re using for production. I’ll remove that from the path and try installing again. Thanks!

naumenko-sa commented 3 years ago

some users use modules to maintain several bcbio installations. Also to the issue of reproducibility - having modules bcbio1.2.0, bcbio1.2.1, bcbio 1.2.2 etc and just linking data installation to each of them helps. https://www.admin-magazine.com/HPC/Articles/Environment-Modules

gis-nlsim commented 3 years ago

bcbio.error.report.txt Sorry, I've removed the old bcbio paths, now encountering this issue (please see attached file) Thanks for your help.

naumenko-sa commented 3 years ago

I think it have hit the network timeout when accessing anaconda.org:

    Traceback (most recent call last):
      File "/home/users/astar/gis/simngl/tools/anaconda3/lib/python3.7/site-packages/conda/exceptions.py", line 1079, in __call__
        return func(*args, **kwargs)
      File "/home/projects/13001702/tools/bcbio/v1.2.4/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 900, in exception_converter
        raise e
      File "/home/projects/13001702/tools/bcbio/v1.2.4/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 894, in exception_converter
        exit_code = _wrapped_main(*args, **kwargs)
      File "/home/projects/13001702/tools/bcbio/v1.2.4/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 853, in _wrapped_main
        result = do_call(args, p)
      File "/home/projects/13001702/tools/bcbio/v1.2.4/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 741, in do_call
        exit_code = create(args, parser)
      File "/home/projects/13001702/tools/bcbio/v1.2.4/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 620, in create
        return install(args, parser, "create")
      File "/home/projects/13001702/tools/bcbio/v1.2.4/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 570, in install
        downloaded = transaction.prompt(PackageCacheData.first_writable().pkgs_dir, repos)
    RuntimeError: Download error (28) Timeout was reached [https://repo.anaconda.com/pkgs/main/linux-64/python-3.6.12-hcff3b4d_2.conda]

try from a less busy server or at not a peak usage time? contact network admins for advice?

gis-nlsim commented 3 years ago

bcbio.error.install.nopath.txt Hi, I've removed any references to the old bcbio installation, but I'm still getting the same issue (please see attached file). Thanks.

My installation command: python bcbio_nextgen_install.py /home/projects/13001702/tools/bcbio/v1.2.4 --tooldir=/home/projects/13001702/tools/bcbio/v1.2.4 --nodata

naumenko-sa commented 3 years ago

Hi!

It is another network timeout this time, so it seems you have sporadic connection issues

RuntimeError: Download error (28) Timeout was reached [https://conda.anaconda.org/bioconda/linux-64/bioconductor-bubbletree-2.6.0-0.tar.bz2]

Try talking you your network's sysadmins. SN

naumenko-sa commented 3 years ago

closing for now! Let us know if you are still having installation issues!