Install issues, possibly due to python2. Syntax error in genomes.py when installing hg38 genome reference and bwa

bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

https://bcbio-nextgen.readthedocs.io

MIT License

992 stars 354 forks source link

Install issues, possibly due to python2. Syntax error in genomes.py when installing hg38 genome reference and bwa #3456

Closed david-a-siegel closed 3 years ago

david-a-siegel commented 3 years ago

Greetings. Sorry this is turning into a headache.

I'm trying to install bcbio on a linux cluster. I'm having trouble getting the install working. After some messing around (I didn't realize at first that anaconda had to be completely deactivated) I deactivated anaconda then installed bcbio: python bcbio_nextgen_install.py /bcbio --tooldir=/bcbio/tools --nodata

I think everything installed, but it didn't quit, even after a couple days. Eventually I killed the process.

If Anaconda is deactivated and I run "which bcbio_nextgen.py", I get "no bcbio_nextgen.py in (various folders)"

If I activate Anaconda and use my base env, it finds the file and properly outputs a version number.

Then I tried to install data and got this error (substituting "[etc]" for the rest of the path):

(base) [dsiegel@dev2 tools]$ bcbio_nextgen.py upgrade -u skip --genomes hg38 --aligners bwa Upgrading bcbio Upgrading bcbio-nextgen data files Traceback (most recent call last): File "[etc]/anaconda2/bin/bcbio_nextgen.py", line 221, in install.upgrade_bcbio(kwargs["args"]) File "[etc]/anaconda2/lib/python2.7/site-packages/bcbio/install.py", line 106, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "[etc]/anaconda2/lib/python2.7/site-packages/bcbio/install.py", line 346, in upgrade_bcbio_data cbl_genomes = import("cloudbio.biodata.genomes", fromlist=["genomes"]) File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 690 cmd= f"bismark_genome_preparation ." ^ SyntaxError: invalid syntax

So I went in and removed the f before the double-quote (here and in a few other places that gave errors). I'm guessing this is a python2 vs python3 issue. The same thing happens whether I run the first install command with python or python3.

Next error:

Upgrading bcbio Upgrading bcbio-nextgen data files List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'name': 'Human (hg38) full', 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'dbkey': 'hg38', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2']}], 'install_uniref': False}'): Human (hg38) full Running GGD recipe: hg38 bwa 1000g-20150219 Traceback (most recent call last): File "[etc]/anaconda2/bin/bcbio_nextgen.py", line 221, in install.upgrade_bcbio(kwargs["args"]) File "[etc]/anaconda2/lib/python2.7/site-packages/bcbio/install.py", line 106, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "[etc]/anaconda2/lib/python2.7/site-packages/bcbio/install.py", line 348, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir) File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes retrieve_fn(env, manager, gid, idx) File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 875, in _install_with_ggd ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid) File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe recipe["recipe"]["full"]["recipe_type"], system_install) File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe subprocess.check_output(["bash", run_file]) File "[etc]/anaconda2/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['bash', '[etc]/genomes/Hsapiens/hg38/txtmp/ggd-run.sh']' returned non-zero exit status 8

I'm really not sure what's going on here, any advice would be welcome.

Thanks,

David

naumenko-sa commented 3 years ago

Hi @david-a-siegel !

Thanks for reporting and sorry about the issues!

Bcbio uses its own anaconda instance, so you should not have another anaconda session activated.

Could you please try to set PATH to find bcbio_nextgen.py and tools? https://bcbio-nextgen.readthedocs.io/en/latest/contents/intro.html#run-the-analysis-distributed-on-8-local-cores-with

Sergey

david-a-siegel commented 3 years ago

Ah, I see. I did this, and now "which bcbio_nextgen.py" and "bcbio_nextgen.py --version" both work.

But I get a new error when I try to install the data:

Upgrading bcbio Upgrading bcbio-nextgen data files Traceback (most recent call last): File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in install.upgrade_bcbio(kwargs["args"]) File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 107, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 359, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "/wynton/home/slee/dsiegel/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 349, in install_data_local os.environ["PATH"] = "%s/bin:%s" % (os.path.join(system_installdir), os.environ["PATH"]) File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.7/posixpath.py", line 80, in join a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType

The problem seems to be that "system_installdir" is empty and shouldn't be. When I print(args.tooldir) in "upgrade_bcbio" in install.py, I get None. When I print(args) in "upgrade_bcbio" in install.py, I get:

Namespace(aligners=['bwa'], cores=1, cwl=False, datatarget=['variation', 'rnaseq', 'smallrna'], distribution='', genomes=['hg38'], install_data=True, isolate=False, revision='master', toolconf=None, tooldir=None, toolplus=[], tools=False, upgrade='skip')

It looks like the problem is in "_get_data_dir()" in install.py, I don't know what this is really doing.

When I type os.environ["PATH"] I get my PATH variable:

os.environ["PATH"] '/wynton/home/slee/dsiegel/anaconda2/condabin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/slee/dsiegel/bin:/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin:/wynton/home/slee/dsiegel/tools/bcbio/tools/bin'

Thanks,

David

naumenko-sa commented 3 years ago

Hi David @david-a-siegel !

A potential issue is anaconda2 in your path: wynton/home/slee/dsiegel/anaconda2/condabin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/slee/dsiegel/bin:/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin:/wynton/home/slee/dsiegel/tools

It should rather be something like: /wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin:/wynton/home/slee/dsiegel/tools/bcbio/tools/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/slee/dsiegel/bin:

Another potential issue is empty tooldir, maybe it has been missed during the initial install?

wget https://raw.github.com/bcbio/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py
python bcbio_nextgen_install.py [bcbio_installation_path] \
--tooldir=[tools_installation_path] \
--nodata

Sergey

yavit1 commented 3 years ago

I got the same issue after trying installation

python3 bcbio_nextgen_install.py ./bcbio --tooldir=./bcbio/tools
--genomes hg38

as described at https://bcbio-nextgen.readthedocs.io/en/latest/contents/installation.html

My installation freezes here

Checking for problematic or migrated packages in default environment
Initalling initial set of packages for default environment with mamba
# Installing into conda environment default: age-metasv, arriba, bamtools=2.4.0, bamutil, bbmap, bcbio-prioritize, [....],r-knitr, r-pheatmap, r-plyr, r-pscbs, r-reshape, r-rmarkdown, r-rsqlite, r-sleuth, r-snow, r-stringi, r-viridis>=0.5, r-wasabi, r=3.5.1, xorg-libxt

It passes the install check:

seqme@seqme-template:~$ which bcbio_nextgen.py
/home/seqme/bin/bcbio_nextgen.py
seqme@seqme-template:~$ bcbio_nextgen.py --version
1.2.7

However it fails to download genome

Upgrading bcbio-nextgen data files
Traceback (most recent call last):
  File "/home/seqme/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/home/seqme/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 107, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/home/seqme/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 359, in upgrade_bcbio_data
    args.cores, ["ggd", "s3", "raw"])
  File "/home/seqme/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 349, in install_data_local
    os.environ["PATH"] = "%s/bin:%s" % (os.path.join(system_installdir), os.environ["PATH"])
  File "/home/seqme/bcbio/anaconda/lib/python3.7/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

There is notool-data folder in the bcbio/galaxy which is mentioned to be necessary for genomes.

naumenko-sa commented 3 years ago

Hi @yavit1!

Thanks for reporting!

Sorry, I did not get the result of your mamba step - did it finish successfully, did it finish halfway, or did not finish at all?

Conda solve can take a lot of time (hours) - make sure your terminal session is not freezing during that time - use nohup or a batch job. I just have had a successful (well, with one issue) installation described here: https://github.com/bcbio/bcbio-nextgen/issues/3459

Also, conda solve can require a lot of RAM - I had installation issues with 2G session, I have increased it to 20G, it helped.

We are rarely installing the data, since we are maintaining and reusing it for years, but I'm trying to reproduce your issue with a fresh data installation.

Sergey

david-a-siegel commented 3 years ago

Hi Sergey @naumenko-sa

I apologize for the delay, I'm trying to fit this in between other things.

I changed my PATH variable. Now it looks like this:

echo $PATH: /wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin:/wynton/home/slee/dsiegel/tools/bcbio/tools/bin:/wynton/home/slee/dsiegel/anaconda2/bin:/wynton/home/slee/dsiegel/anaconda2/condabin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/slee/dsiegel/bin

Still doesn't work.

The second directory is the tools directory, which seems to be present. It just has a "bin" folder, and 4 python scripts inside: bcbio_fastq_umi_prep.py, bcbio_nextgen.py, bcbio_prepare_samples.py, bcbio_setup_genome.py.

It's possible that the initial install didn't finish completely, as I said it froze or something at a certain point and I killed it after 24+ hours (I did run it with nohup using the --nodata flag), but the install checks worked. There are a couple of potentially temporary directories that didn't get cleaned up, "tmpbcbio-install" and "bcbiotx" are both present in addition to "bcbio".

It looks like the line that was causing problems was just looking for the tool directory, so I've hard-coded it in and it seems to be working now (it's currently downloading hg38.fa.gz with wget). I'll let you know if there are further problems. Thanks for all your help...

David

david-a-siegel commented 3 years ago

Greetings Sergey @naumenko-sa

Here's the next error:

Upgrading bcbio-nextgen data files List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}], 'genome_indexes': ['bwa', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full Moving on to next genome prep method after trying ggd GGD recipe not available for hg38 rtg Downloading genome from s3: hg38 rtg Moving on to next genome prep method after trying s3 No pre-computed indices for hg38 rtg Preparing genome hg38 with index rtg Moving on to next genome prep method after trying raw Command 'export RTG_JAVA_OPTS='-Xms1g' && export RTG_MEM=2g && rtg format -o rtg/hg38.sdf /wynton/home/slee/dsiegel/tools/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa' returned non-zero exit status 127. Traceback (most recent call last): File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in install.upgrade_bcbio(kwargs["args"]) File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 112, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 364, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "/wynton/home/slee/dsiegel/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 356, in install_data_local _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir) File "/wynton/home/slee/dsiegel/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 497, in _prep_genomes .format(idx, gid, last_exc)) OSError: Could not prepare index rtg for hg38 by any method Traceback (most recent call last): File "/wynton/home/slee/dsiegel/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 482, in _prep_genomes retrieve_fn(env, manager, gid, idx) File "/wynton/home/slee/dsiegel/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 521, in _prep_raw_index get_index_fn(idx)(env, ref_file) File "/wynton/home/slee/dsiegel/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 852, in _index_rtg subprocess.check_call(cmd.format(**locals()), shell=True) File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.7/subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'export RTG_JAVA_OPTS='-Xms1g' && export RTG_MEM=2g && rtg format -o rtg/hg38.sdf /wynton/home/slee/dsiegel/tools/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa' returned non-zero exit status 127.

naumenko-sa commented 3 years ago

Hi @david-a-siegel !

It looks like the reference genome has been downloaded: /wynton/home/slee/dsiegel/tools/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa but there is no rtg command available (conda installation has not been finished?), so it can't make an rtg index for validation runs.

Can you check which rtg, conda list rtg and rtg --version? It should be installed in bcbio/anaconda/bin/rtg.

Sergey

yavit1 commented 3 years ago

Hi @yavit1!

Thanks for reporting!

Sorry, I did not get the result of your mamba step - did it finish successfully, did it finish halfway, or did not finish at all?

Conda solve can take a lot of time (hours) - make sure your terminal session is not freezing during that time - use nohup or a batch job. I just have had a successful (well, with one issue) installation described here:

3459

Also, conda solve can require a lot of RAM - I had installation issues with 2G session, I have increased it to 20G, it helped.

We are rarely installing the data, since we are maintaining and reusing it for years, but I'm trying to reproduce your issue with a fresh data installation.

Sergey

Please advice how do I increase the RAM allocation for the install?

naumenko-sa commented 3 years ago

It depends on how you are running the installation. If you are working in an interactive session (srun in slurm), try srun --mem=20G --pty /bin/bash). If you are submitting an sbatch job (slurm) try also #SBATCH --mem=20G. Other batch systems also have similar parameters.

naumenko-sa commented 3 years ago

@david-a-siegel if you think that your installation did not finish properly and you have killed the script - just restart it and conda should continue the installation. There are several environments to deploy.

david-a-siegel commented 3 years ago

Thanks @naumenko-sa. I did find that rtg was not installed -- there is no folder for it and your which/conda list/--version commands came up empty. I tried to reinstall it and just let it run yesterday.

Your suggestion is to re-run the command when it times out? I'm trying it now.

nohup python3 bcbio_nextgen_install.py bcbio --tooldir=bcbio/tools --nodata &

Thanks,

David

yavit1 commented 3 years ago

@naumenko-sa How can I figure the RAM size requested by the installation process? Thank you

naumenko-sa commented 3 years ago

@yavit1

It depends on how you are running the installation. If you are working in an interactive session (srun in slurm), try srun --mem=20G --pty /bin/bash). If you are submitting an sbatch job (slurm) try also #SBATCH --mem=20G. Other batch systems also have similar parameters.

@david-a-siegel See https://github.com/bcbio/bcbio-nextgen/issues/3462 had a successful install. Sometimes it is conda server's timeouts.

yavit1 commented 3 years ago

@naumenko-sa Mine has failed even after reproducing #3462 . I'm trying to run it on ubuntu 16.04 LTS Memory 8Gb 64-bit Disk 60.4 GB Is worth the effort to do it in such a compact setting?

....................tzdata                2021a  he74cb21_0           conda-forge/noarch       Cached
  wheel                0.36.2  pyhd3deb0d_0         conda-forge/noarch       Cached
  xz                    5.2.5  h516909a_1           conda-forge/linux-64     Cached
  zlib                 1.2.11  h516909a_1010        conda-forge/linux-64     Cached

  Summary:

  Install: 33 packages

  Total download: 0  B

─────────────────────────────────────────────────────────────────────────────────────

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Out of memory allocating 1221361016 bytes!
Killed
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done

david-a-siegel commented 3 years ago

Thanks @naumenko-sa. I will try to re-run it a few times over the next couple days. The machine has 512 GiB of RAM so I don't think that's the issue for me. I can't run a batch job because our compute nodes don't connect to the internet, only the dev nodes. Do you have another way to install and run the software?

naumenko-sa commented 3 years ago

@david-a-siegel @yavit1 I have installed 1.2.8 yesterday - conda sometimes freezes but re-running helps to pick it up from the last successful transaction. I don't see any errors, it seems that our environments just became too heavy (too many packages, total size = 36G). It is time to clean them up as suggested https://github.com/chapmanb/cloudbiolinux/pull/341

@yavit1 Unfortunately, 8G RAM could be not enough for conda solves. Also if you are going to run analyses on that node, there is not much you can do - you will need to run 1-threaded analyses (4G/core is a min). Still, if you just experimenting with bcbio, you could try running variant2 bwa/vardict analyses using chr22. RNA-seq STAR needs 30G RAM min. HDD limitation is also crucial: --nodata installation currently is 36G and the references would add more Gb. hg38/seq = 3.2G, hg38/snpeff=1.8G, hg38/variation = 260G. Even if you fit bcbio into this machine, you won't have the space for input data (10X GB) and work directory (tmp files could be 50GB-1T depending on the project size).

yavit1 commented 3 years ago

I was able to install bcbio-rnaseq and bcbio-vc from Docker. Then I went ahead to get the hg38 genome (I had to create the genomes directory manually). The genome was successfully downloaded, however, I ran on the same issue with rtg as @david-a-siegel subprocess.CalledProcessError: Command 'export PATH=/usr/local/bin:$PATH && export RTG_JAVA_OPTS='-Xms1g' && export RTG_MEM=2g && rtg format -o rtg/hg38.sdf /mnt/biodata/genomes/Hsapiens/hg38/seq/hg38.fa' returned non-zero exit status 127. rtg is missing in /usr/local/share/bcbio-nextgen/anaconda/bin

david-a-siegel commented 3 years ago

@naumenko-sa An update: I actually got the first step of the installation to finish without hanging. It just took a very long time. Here's what I did:

Make sure conda is deactivated
Removed all instances of anaconda from my PATH variable
After the install seemed to hang the first time, I killed the process, then added 4 threads to default in the .condarc file (this was suggested to me by our network admin) (excuse the poor formatting): cat ~/tools/bcbio/anaconda/condarc channels:
- conda-forge
- bioconda
- defaults

default_threads: 4

It eventually finished in 35 hours(!). The node had 512GiB ram, Intel Xeon E5-2640 v3 2.60GHz

Now I'm running bcbio_nextgen.py upgrade -u skip --genomes hg38 --aligners bwa

naumenko-sa commented 3 years ago

@david-a-siegel !

Thanks for debugging! Glad it worked. 35h is definitely too much. I switched back to conda instead of mamba - the install took 2h without data and not stalling. Those who prefer mamba could do it with --mamba option.

naumenko-sa commented 3 years ago

@yavit1 dockers, bcbio_vm are off for now.

david-a-siegel commented 3 years ago

Hi @naumenko-sa

Back at it. I tried to upgrade bcbio using:

bcbio_nextgen.py upgrade -u skip --genomes hg38 --aligners bwa

At some point there was an error:

Upgrading bcbio Upgrading bcbio-nextgen data files List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}], 'genome_indexes': ['bwa', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full Running GGD recipe: hg38 seq 1000g-20150219_1 Running GGD recipe: hg38 bwa 1000g-20150219 Traceback (most recent call last): File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in install.upgrade_bcbio(kwargs["args"]) File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 107, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 359, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "/wynton/home/slee/dsiegel/tools/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir) File "/wynton/home/slee/dsiegel/tools/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes retrieve_fn(env, manager, gid, idx) File "/wynton/home/slee/dsiegel/tools/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 875, in _install_with_ggd ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid) File "/wynton/home/slee/dsiegel/tools/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe recipe["recipe"]["full"]["recipe_type"], system_install) File "/wynton/home/slee/dsiegel/tools/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe subprocess.check_output(["bash", run_file]) File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.6/subprocess.py", line 356, in check_output **kwargs).stdout File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['bash', '/wynton/home/slee/dsiegel/tools/bcbio/genomes/Hsapiens/hg38/txtmp/ggd-run.sh']' returned non-zero exit status 8.

Do you know what this error might mean? It did download hg38, but some of the other files and directories don't appear to have finished.

Thanks!

naumenko-sa commented 3 years ago

Thanks for reporting @david-a-siegel !

I think I see, why: https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg38/bwa.yaml#L14

Please try again!

david-a-siegel commented 3 years ago

Hi @naumenko-sa

It says it completed without error, but it didn't download all the files in that script. It downloaded the hg38.fa.sa, .bwt, .alt, .ann, and .pac files, but not the .ann or .amb (it created a file with the *.amb name but downloaded zero bytes). I downloaded them manually -- will let you know if there's another error.

David