bcbio / bcbio-nextgen-vm

Run bcbio-nextgen genomic sequencing analyses using isolated containers and virtual machines
MIT License
65 stars 17 forks source link

CalledProcessError: Command '['bash', '/mnt/biodata/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 1 #174

Closed iJasonZHOU closed 5 years ago

iJasonZHOU commented 5 years ago

I used the following command to install dockerized bcbio-nextgen:

bcbio_vm.py --datadir=~/install/bcbio_vm/data install --data --tools --genomes GRCh37 --aligners bwa

I have googled similar problems and cannot still solve it. Does anyone have solutions to this error? Looking forward to your reply. Thank you very much.

The following is the complete error log (bolded some error information):

[2019-02-19T09:50Z] DEBUG: Upgrading bcbio [2019-02-19T09:50Z] DEBUG: Upgrading bcbio-nextgen data files [2019-02-19T09:50Z] DEBUG: List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'name': 'Human (GRCh37)', 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'dream-syn3', 'dream-syn4', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149'], 'annotations': ['GA4GH_problem_regions', 'capture_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'mirbase'], 'dbkey': 'GRCh37', 'indexes': ['seq', 'twobit']}], 'install_uniref': False}'): Human (GRCh37) [2019-02-19T09:50Z] DEBUG: Running GGD recipe: GRCh37 GA4GH_problem_regions 20181016 [2019-02-19T09:50Z] DEBUG: Traceback (most recent call last): [2019-02-19T09:50Z] DEBUG: File "/usr/local/bin/bcbio_nextgen.py", line 221, in [2019-02-19T09:50Z] DEBUG: install.upgrade_bcbio(kwargs["args"]) [2019-02-19T09:50Z] DEBUG: File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 105, in upgrade_bcbio [2019-02-19T09:50Z] DEBUG: upgrade_bcbio_data(args, REMOTES) [2019-02-19T09:50Z] DEBUG: File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 346, in upgrade_bcbio_data [2019-02-19T09:50Z] DEBUG: args.cores, ["ggd", "s3", "raw"]) [2019-02-19T09:50Z] DEBUG: File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 349, in install_data_local [2019-02-19T09:50Z] DEBUG: _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir) [2019-02-19T09:50Z] DEBUG: File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 475, in _prep_genomes [2019-02-19T09:50Z] DEBUG: retrieve_fn(env, manager, gid, idx) [2019-02-19T09:50Z] DEBUG: File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 845, in _install_with_ggd [2019-02-19T09:50Z] DEBUG: ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid) [2019-02-19T09:50Z] DEBUG: File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe [2019-02-19T09:50Z] DEBUG: recipe["recipe"]["full"]["recipe_type"], system_install) [2019-02-19T09:50Z] DEBUG: File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe [2019-02-19T09:50Z] DEBUG: subprocess.check_output(["bash", run_file]) [2019-02-19T09:50Z] DEBUG: File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 223, in check_output [2019-02-19T09:50Z] DEBUG: raise CalledProcessError(retcode, cmd, output=output) [2019-02-19T09:50Z] DEBUG: subprocess.CalledProcessError: Command '['bash', '/mnt/biodata/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 1 [2019-02-19T09:50Z] ERROR: Uncaught exception occurred Traceback (most recent call last): File "/home/zhoubiao/miniconda3/lib/python2.7/site-packages/bcbio/provenance/do.py", line 26, in run _do_run(cmd, checks, log_stdout, env=env) File "/home/zhoubiao/miniconda3/lib/python2.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'docker attach --no-stdin eeb0f63216531bdf8189daa915fdc2c5be9a434cbc4da80c070b30076e28d3e8 --2019-02-19 09:49:39-- https://github.com/chapmanb/cloudbiolinux/archive/master.tar.gz Resolving github.com... 13.250.177.223, 52.74.223.119, 13.229.188.59 Connecting to github.com|13.250.177.223|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/master [following] --2019-02-19 09:49:41-- https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/master Resolving codeload.github.com... 13.229.189.0, 54.251.140.56, 13.250.162.133 Connecting to codeload.github.com|13.229.189.0|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/x-gzip] Saving to: 'STDOUT' 0K ........ ........ ........ ........ ........ ........ 195K 3072K ........ ........ ........ ....... 192K=26s 2019-02-19 09:50:08 (194 KB/s) - written to stdout [5178641] --2019-02-19 09:50:08-- http://bcbio_nextgen.s3.amazonaws.com/GA4GH_problem_regions.zip Resolving bcbio_nextgen.s3.amazonaws.com... 52.216.129.179 Connecting to bcbio_nextgen.s3.amazonaws.com|52.216.129.179|:80... connected. HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable The file is already fully retrieved; nothing to do. [bgzip] can't create bad_promoter.bed.gz: File exists Upgrading bcbio Upgrading bcbio-nextgen data files List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'name': 'Human (GRCh37)', 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'dream-syn3', 'dream-syn4', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149'], 'annotations': ['GA4GH_problem_regions', 'capture_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'mirbase'], 'dbkey': 'GRCh37', 'indexes': ['seq', 'twobit']}], 'install_uniref': False}'): Human (GRCh37) Running GGD recipe: GRCh37 GA4GH_problem_regions 20181016 Traceback (most recent call last): File "/usr/local/bin/bcbio_nextgen.py", line 221, in install.upgrade_bcbio(kwargs["args"]) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 105, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 346, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 349, in install_data_local _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 475, in _prep_genomes retrieve_fn(env, manager, gid, idx) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 845, in _install_with_ggd ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe recipe["recipe"]["full"]["recipe_type"], system_install) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe subprocess.check_output(["bash", run_file]) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['bash', '/mnt/biodata/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 1 ' returned non-zero exit status 1 Stopping docker container Traceback (most recent call last): File "/home/zhoubiao/miniconda3/bin/bcbio_vm.py", line 354, in args.func(args) File "/home/zhoubiao/miniconda3/bin/bcbio_vm.py", line 40, in cmd_install install.full(args, devel.DOCKER) File "/home/zhoubiao/miniconda3/lib/python2.7/site-packages/bcbiovm/docker/install.py", line 40, in full manage.run_bcbio_cmd(args.image, dmounts, _get_cl(args)) File "/home/zhoubiao/miniconda3/lib/python2.7/site-packages/bcbiovm/docker/manage.py", line 47, in run_bcbio_cmd raise e subprocess.CalledProcessError: Command 'docker attach --no-stdin eeb0f63216531bdf8189daa915fdc2c5be9a434cbc4da80c070b30076e28d3e8 --2019-02-19 09:49:39-- https://github.com/chapmanb/cloudbiolinux/archive/master.tar.gz Resolving github.com... 13.250.177.223, 52.74.223.119, 13.229.188.59 Connecting to github.com|13.250.177.223|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/master [following] --2019-02-19 09:49:41-- https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/master Resolving codeload.github.com... 13.229.189.0, 54.251.140.56, 13.250.162.133 Connecting to codeload.github.com|13.229.189.0|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/x-gzip] Saving to: 'STDOUT' 0K ........ ........ ........ ........ ........ ........ 195K 3072K ........ ........ ........ ....... 192K=26s 2019-02-19 09:50:08 (194 KB/s) - written to stdout [5178641] --2019-02-19 09:50:08-- http://bcbio_nextgen.s3.amazonaws.com/GA4GH_problem_regions.zip Resolving bcbio_nextgen.s3.amazonaws.com... 52.216.129.179 Connecting to bcbio_nextgen.s3.amazonaws.com|52.216.129.179|:80... connected. HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable The file is already fully retrieved; nothing to do. [bgzip] can't create bad_promoter.bed.gz: File exists Upgrading bcbio Upgrading bcbio-nextgen data files List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'name': 'Human (GRCh37)', 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'dream-syn3', 'dream-syn4', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149'], 'annotations': ['GA4GH_problem_regions', 'capture_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'mirbase'], 'dbkey': 'GRCh37', 'indexes': ['seq', 'twobit']}], 'install_uniref': False}'): Human (GRCh37) Running GGD recipe: GRCh37 GA4GH_problem_regions 20181016 Traceback (most recent call last): File "/usr/local/bin/bcbio_nextgen.py", line 221, in install.upgrade_bcbio(kwargs["args"]) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 105, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 346, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 349, in install_data_local _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 475, in _prep_genomes retrieve_fn(env, manager, gid, idx) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 845, in _install_with_ggd ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe recipe["recipe"]["full"]["recipe_type"], system_install) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe subprocess.check_output(["bash", run_file]) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['bash', '/mnt/biodata/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 1 ' returned non-zero exit status 1

chapmanb commented 5 years ago

Thanks for the report and apologies about the issue. It looks like the install had a previous run with a partial output directory and the restart then fails when re-retrieving and pulling. Have you been finding network failures or other issues during this run. Practically, running the install inside of Docker makes it a bit trickier to debug, but if you remove the temporary directory:

rm -rf /mnt/biodata/genomes/Hsapiens/GRCh37/txtmp

and re-try, does this help avoid the issue? Hope this helps get your install finished.

iJasonZHOU commented 5 years ago

@chapmanb Thank you for your timely reply. Due to the poor network, It always occurs time out when connecting to the amazonaws. After the disconnection, the script re-tried several times and reported the above error. I have removed the temporary directory in the local path, not this path: /mnt/biodata/genomes/Hsapiens/GRCh37/txtmp. Now the script is downloading the files. By the way, is there any faster method to download the files? Also, I have downloaded some variation files like 1000G in previous assay. Can I put the files in the specific directory so that it can spare some time ? Thank you.

chapmanb commented 5 years ago

Thanks for the follow up, and glad to hear that removing the directory and re-starting can fix the issue. Unfortunately we don't have another download mirror other than the current locations. If you've pre-downloaded inputs you can put them in the right place in your biodata directory and then edit the versions.csv file to add the name and version of the file you've manually placed there. That does require understanding what each of the recipes download and need, so isn't a great overall solution. Sorry to not have a better general fix but hopefully retrying will get things working cleanly for you.

iJasonZHOU commented 5 years ago

@chapmanb Thanks for you reply. I have running into other problems in the subsequent downloading. It seems due to the network failure. But when I used wget command, the data file is available. I tried to remove the txtmp directory, but it didn't work. The error still occurs. By the way, how can I get the source urls of these data files. I'd like to download them manually. Thank you.

The following is the error info:

--2019-02-22 01:27:47-- http://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv3.3.2/GRCh37//HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz Resolving ftp-trace.ncbi.nlm.nih.gov... 130.14.250.12, 2607:f220:41e:250::11 Connecting to ftp-trace.ncbi.nlm.nih.gov|130.14.250.12|:80... failed: Connection timed out. Connecting to ftp-trace.ncbi.nlm.nih.gov|2607:f220:41e:250::11|:80... failed: Network is unreachable. Upgrading bcbio Upgrading bcbio-nextgen data files List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'name': 'Human (GRCh37)', 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'dream-syn3', 'dream-syn4', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149'], 'annotations': ['GA4GH_problem_regions', 'capture_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'mirbase'], 'dbkey': 'GRCh37', 'indexes': ['seq', 'twobit']}], 'install_uniref': False}'): Human (GRCh37) Running GGD recipe: GRCh37 giab-NA12878 v3_3_2 Traceback (most recent call last): File "/usr/local/bin/bcbio_nextgen.py", line 221, in install.upgrade_bcbio(kwargs["args"]) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 105, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 346, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 349, in install_data_local _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 475, in _prep_genomes retrieve_fn(env, manager, gid, idx) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 845, in _install_with_ggd ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe recipe["recipe"]["full"]["recipe_type"], system_install) File "/home/zhoubiao/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe subprocess.check_output(["bash", run_file]) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['bash', '/mnt/biodata/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 4 ' returned non-zero exit status 1

chapmanb commented 5 years ago

Thanks for following up and continuing to work on this. I'm sorry to not have a better answer but these all look like intermittent connectivity failures that will be fixed by restarts. Unfortunately we're reliant on a lot of data sources and hosting them all ourselves for a single manual download isn't something we're able to do. My only practical suggestion would be to explore using a cloud service provider like AWS, GCP or Azure where you might have more consistent connectivity. Hopefully either that or working through the download timeouts will get you an install to use.

iJasonZHOU commented 5 years ago

@chapmanb Thanks for the reply. I have tried many times to restart the process, but the error still occurs. I don't know how to solve this.