Closed DolapoA closed 4 years ago
Hi @DolapoA !
This would be a huge installation.
You may try to install with --nodata
first, and then install datasets one by one with bcbio_nextgen.py upgrade -u skip
https://bcbio-nextgen.readthedocs.io/en/latest/contents/installation.html
S
Thanks Naumenko, I will try this and get back to you.
D
After running the following standard install with nodata
:
python ${PIPEDIR}/bcbio_nextgen_install.py $PIPEDIR --tooldir ${PIPEDIR}/tools \
--nodata \
I encountered the following error:
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/exceptions.py", line 1079, in __call__
return func(*args, **kwargs)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/cli/main.py", line 84, in _main
exit_code = do_call(args, p)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/cli/conda_argparse.py", line 82, in do_call
return getattr(module, func_name)(args, parser)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/cli/main_install.py", line 20, in execute
install(args, parser, 'install')
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/cli/install.py", line 265, in install
should_retry_solve=(_should_retry_unfrozen or repodata_fn != repodata_fns[-1]),
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/core/solve.py", line 117, in solve_for_transaction
should_retry_solve)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/core/solve.py", line 158, in solve_for_diff
force_remove, should_retry_solve)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/core/solve.py", line 262, in solve_final_state
ssc = self._collect_all_metadata(ssc)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/common/io.py", line 88, in decorated
return f(*args, **kwds)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/core/solve.py", line 415, in _collect_all_metadata
index, r = self._prepare(prepared_specs)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/core/solve.py", line 1011, in _prepare
self.subdirs, prepared_specs, self._repodata_fn)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/core/index.py", line 228, in get_reduced_index
repodata_fn=repodata_fn)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/core/subdir_data.py", line 105, in query_all
result = tuple(concat(executor.map(subdir_query, channel_urls)))
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/concurrent/futures/_base.py", line 575, in map
fs = [self.submit(fn, *args) for args in zip(*iterables)]
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/conda/common/io.py", line 560, in submit
self._adjust_thread_count()
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/concurrent/futures/thread.py", line 142, in _adjust_thread_count
t.start()
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.6/threading.py", line 846, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
There was also a subprocess error at the end of the log
Upload successful.
Checking required dependencies
Installing isolated base python installation
Installing mamba
Installing conda-build
Installing bcbio-nextgen
Traceback (most recent call last):
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/bcbio_nextgen_install.py", line 290, in <module>
main(parser.parse_args(), sys.argv[1:])
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/bcbio_nextgen_install.py", line 46, in main
bcbio = install_conda_pkgs(anaconda, args)
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/bcbio_nextgen_install.py", line 106, in install_conda_pkgs
"--file", os.path.basename(REMOTES["requirements"])], env=env)
File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/SAN/colcc/pillaylab-software/bcbio-pipeline/anaconda/bin/conda', 'install', '--yes', '--file', 'requirements-conda.txt']' returned non-zero exit status 1
But I'm guessing this always shows up when there's a prior error.
Uploaded log file bcbio_pipeline_installation2.txt
I've decided to start the installation from scratch in as simple a way possible, bit by bit, I will keep you updated on the progress.
D.
Simple run command:
PIPEDIR="/SAN/colcc/lab-software/bcbio-pipeline"
python ${PIPEDIR}/bcbio_nextgen_install.py $PIPEDIR --tooldir ${PIPEDIR}/tools \
--nodata \
Error encountered:
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.7/site-packages/conda/exceptions.py", line 1079, in __call__
return func(*args, **kwargs)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 809, in exception_converter
raise e
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 803, in exception_converter
exit_code = _wrapped_main(*args, **kwargs)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 769, in _wrapped_main
exit_code = do_call(args, p)
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 659, in do_call
exit_code = install(args, parser, 'install')
File "/SAN/colcc/lab-software/bcbio-pipeline/anaconda/lib/python3.7/site-packages/mamba/mamba.py", line 529, in install
downloaded = transaction.prompt(PackageCacheData.first_writable().pkgs_dir, repos)
RuntimeError: Resource temporarily unavailable
`$ /SAN/colcc/lab-software/bcbio-pipeline/anaconda/bin/mamba install --yes --only-deps bcbio-nextgen`
Mamba's implicated.
Installation log: bcbio_pipeline_installation3.txt
Hi @DolapoA !
I can only think that there is a connection error: https://github.com/TheSnakePit/mamba/blob/master/mamba/mamba.py#L532
From your log I see that you may be running bcbio installation as a SGE job:
SGE_STDERR_PATH=/home/dajayi/general_output/bcbio_pipeline_installation.o2261298
SGE_STDIN_PATH=/dev/null
SGE_STDOUT_PATH=/home/dajayi/general_output/bcbio_pipeline_installation.o2261298
If it goes to the compute node, it might have a limited internet connection or proxy server is required. Try to install from a login or transfer node? Ask sysadmins re connection?
Sergey
Hi @naumenko-sa,
The simple installation worked well, however, when trying to make additions I've come across an error, not sure what the cause is:
Script:
bcbio_nextgen.py upgrade -u skip --datatarget gnomad --genomes GRCh37
Error:
[E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<I"
[W::bcf_hdr_parse] Could not parse header line: ##contig=<I
[E::bcf_hdr_parse] Could not parse the header, sample line not found
Failed to open -: could not parse header
Failed to open -: unknown file type
[bcf_ordered_reader.cpp:49 BCFOrderedReader] Not a VCF/BCF file: -
[E:bcf_synced_reader.cpp:87 BCFSyncedReader] - not a VCF or BCF file
Traceback (most recent call last):
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/tools/bin/bcbio_nextgen.py", line 228, in <module>
install.upgrade_bcbio(kwargs["args"])
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 107, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 377, in upgrade_bcbio_data
args.cores, ["ggd", "s3", "raw"])
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local
_prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir)
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 875, in _install_with_ggd
ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid)
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
recipe["recipe"]["full"]["recipe_type"], system_install)
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
subprocess.check_output(["bash", run_file])
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/anaconda/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/anaconda/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bash', '/SAN/colcc/pillaylab-software/bcbio-pipeline/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 1.
Dolapo.
Hi @DolapoA!
Sorry about the delay!
It is a valid concern - we had to update our grch37 gnomad recipe.
I've updated it in https://github.com/chapmanb/cloudbiolinux/pull/361 Could you please help to test it? (some hits are here): https://github.com/chapmanb/cloudbiolinux/blob/master/doc/hacking.md#testing-a-ggd-recipe
Sergey
Hi @naumenko-sa
Running the new gnomad recipe, I get an error in ggd-run.sh, line 14: vcf_prefix: unbound variable. Looking at how ggd-run.sh works, I think you meant to use url_prefix instead of vcf_prefix for line 14.
Thanks, Ivan
Hi @naumenko-sa
I encountered the same error as @IvantheDugtrio As he mentioned, I think that typo could be part of the problem.
Regards, Dolapo.
Dear @DolapoA,
The latest cloudbiolinux
includes a fix for that specific issue.
Cheers, -- Paulo
The error I mention on the 11/08/20 seems to be specifically related to downloading the GRCh37 genome as opposed to gnomad. Or are you saying they're the same?
Hi @DolapoA,
Yup, the problem was in grabbing gnomAD as part of the GRCh37 genome installation. You should be all set now, feel free to reopen if this didn't end up fixing your issue though. Thanks so much!
The command I ran:
bcbio_nextgen.py upgrade -u skip --genomes GRCh37
I modified the gnomad recipe as instructed in "In bcbio the alternative instruction is to" with the latest gnomad script however, I got this error when I tried to run the upgrade command above.
Please bear in mind I ran it previously with --datatarget gnomad
and it seemed to complete the gnomad part, which is why I've left that part out:
Upgrading bcbio
Upgrading bcbio-nextgen data files
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'GRCh37', 'name': 'Human (GRCh37)', 'indexes': ['seq', 'twobit'], 'annotations': ['GA4GH_problem_regions', 'capture_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'dream-syn3', 'dream-syn4', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}], 'genome_indexes': ['rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (GRCh37)
Running GGD recipe: GRCh37 srnaseq 20180710
2020-09-30 12:30:31 URL: ftp://mirbase.org/pub/mirbase/20/genomes/hsa.gff3 [519390] -> "hsahg19.gff3" [1]
gzip: refGene.txt.gz: decompression OK, trailing garbage ignored
Traceback (most recent call last):
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/tools/bin/bcbio_nextgen.py", line 228, in <module>
install.upgrade_bcbio(kwargs["args"])
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 107, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 377, in upgrade_bcbio_data
args.cores, ["ggd", "s3", "raw"])
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local
_prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir)
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 875, in _install_with_ggd
ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid)
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
recipe["recipe"]["full"]["recipe_type"], system_install)
File "/home/dajayi/scripts/bcbio_pipeline/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
subprocess.check_output(["bash", run_file])
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/anaconda/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/SAN/colcc/pillaylab-software/bcbio-pipeline/anaconda/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bash', '/SAN/colcc/pillaylab-software/bcbio-pipeline/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 2.
Thanks, looks like the mirbase installation script isn't working correctly, looking at it now.
Hi @DolapoA,
I think this is due to zcat
not being the same as gunzip -c
on your system. We haven't run into this before, but it looks like that is a thing on some UNIX systems (see https://en.wikibooks.org/wiki/Guide_to_Unix/Commands/File_Compression#zcat). If you nuke the tmpbcbio-install
directory where you were running the upgrades you should get an updated recipe that fixes this. Unfortunately some of the data you might have installed already might be corrupted, sorry about that. To be safe I'd nuke the install and start over.
Thanks I'll try that.
Let me know if this doesn't fix it-- the other reason why this might not be working is wget is not able to download the files. If that is the case I think I have a fix for that as well, and that won't require you to re-install.
Sorry, I've reverted it: no gzcat on 3 Linux systems I checked up (CentOS, CentOS, Fedora). I have only found it in MacOS.
I just have run this recipe, it runs ok: https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg19/mirbase.yaml
Could you please try to re-run it? Maybe it was a mirbase server issue?
Thank you!
Version info
bcbio_nextgen.py --version
): 1.2.3lsb_release -ds
): "CentOS Linux release 7.6.1810 (Core) "To Reproduce Exact bcbio command you have used:
Observed behavior Error message or bcbio output:
Expected behavior Completed installation, including battenberg, vep and gnomad.
Log files Please attach (10MB max): bcbio_pipeline_installation.txt