Closed david-a-siegel closed 3 years ago
Hi @david-a-siegel !
Thanks for reporting and sorry about the issues!
Bcbio uses its own anaconda instance, so you should not have another anaconda session activated.
Could you please try to set PATH to find bcbio_nextgen.py and tools? https://bcbio-nextgen.readthedocs.io/en/latest/contents/intro.html#run-the-analysis-distributed-on-8-local-cores-with
Sergey
Ah, I see. I did this, and now "which bcbio_nextgen.py" and "bcbio_nextgen.py --version" both work.
But I get a new error when I try to install the data:
Upgrading bcbio
Upgrading bcbio-nextgen data files
Traceback (most recent call last):
File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in
The problem seems to be that "system_installdir" is empty and shouldn't be. When I print(args.tooldir) in "upgrade_bcbio" in install.py, I get None. When I print(args) in "upgrade_bcbio" in install.py, I get:
Namespace(aligners=['bwa'], cores=1, cwl=False, datatarget=['variation', 'rnaseq', 'smallrna'], distribution='', genomes=['hg38'], install_data=True, isolate=False, revision='master', toolconf=None, tooldir=None, toolplus=[], tools=False, upgrade='skip')
It looks like the problem is in "_get_data_dir()" in install.py, I don't know what this is really doing.
When I type os.environ["PATH"] I get my PATH variable:
os.environ["PATH"] '/wynton/home/slee/dsiegel/anaconda2/condabin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/slee/dsiegel/bin:/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin:/wynton/home/slee/dsiegel/tools/bcbio/tools/bin'
Thanks,
David
Hi David @david-a-siegel !
A potential issue is anaconda2 in your path: wynton/home/slee/dsiegel/anaconda2/condabin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/slee/dsiegel/bin:/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin:/wynton/home/slee/dsiegel/tools
It should rather be something like: /wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin:/wynton/home/slee/dsiegel/tools/bcbio/tools/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/slee/dsiegel/bin:
Another potential issue is empty tooldir, maybe it has been missed during the initial install?
wget https://raw.github.com/bcbio/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py
python bcbio_nextgen_install.py [bcbio_installation_path] \
--tooldir=[tools_installation_path] \
--nodata
Sergey
I got the same issue after trying installation
python3 bcbio_nextgen_install.py ./bcbio --tooldir=./bcbio/tools
--genomes hg38
as described at https://bcbio-nextgen.readthedocs.io/en/latest/contents/installation.html
My installation freezes here
Checking for problematic or migrated packages in default environment
Initalling initial set of packages for default environment with mamba
# Installing into conda environment default: age-metasv, arriba, bamtools=2.4.0, bamutil, bbmap, bcbio-prioritize, [....],r-knitr, r-pheatmap, r-plyr, r-pscbs, r-reshape, r-rmarkdown, r-rsqlite, r-sleuth, r-snow, r-stringi, r-viridis>=0.5, r-wasabi, r=3.5.1, xorg-libxt
It passes the install check:
seqme@seqme-template:~$ which bcbio_nextgen.py
/home/seqme/bin/bcbio_nextgen.py
seqme@seqme-template:~$ bcbio_nextgen.py --version
1.2.7
However it fails to download genome
Upgrading bcbio-nextgen data files
Traceback (most recent call last):
File "/home/seqme/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in <module>
install.upgrade_bcbio(kwargs["args"])
File "/home/seqme/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 107, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/home/seqme/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 359, in upgrade_bcbio_data
args.cores, ["ggd", "s3", "raw"])
File "/home/seqme/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 349, in install_data_local
os.environ["PATH"] = "%s/bin:%s" % (os.path.join(system_installdir), os.environ["PATH"])
File "/home/seqme/bcbio/anaconda/lib/python3.7/posixpath.py", line 80, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
There is notool-data
folder in the bcbio/galaxy
which is mentioned to be necessary for genomes.
Hi @yavit1!
Thanks for reporting!
Sorry, I did not get the result of your mamba step - did it finish successfully, did it finish halfway, or did not finish at all?
Conda solve can take a lot of time (hours) - make sure your terminal session is not freezing during that time - use nohup or a batch job. I just have had a successful (well, with one issue) installation described here: https://github.com/bcbio/bcbio-nextgen/issues/3459
Also, conda solve can require a lot of RAM - I had installation issues with 2G session, I have increased it to 20G, it helped.
We are rarely installing the data, since we are maintaining and reusing it for years, but I'm trying to reproduce your issue with a fresh data installation.
Sergey
Hi Sergey @naumenko-sa
I apologize for the delay, I'm trying to fit this in between other things.
I changed my PATH variable. Now it looks like this:
echo $PATH: /wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin:/wynton/home/slee/dsiegel/tools/bcbio/tools/bin:/wynton/home/slee/dsiegel/anaconda2/bin:/wynton/home/slee/dsiegel/anaconda2/condabin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/slee/dsiegel/bin
Still doesn't work.
The second directory is the tools directory, which seems to be present. It just has a "bin" folder, and 4 python scripts inside: bcbio_fastq_umi_prep.py, bcbio_nextgen.py, bcbio_prepare_samples.py, bcbio_setup_genome.py.
It's possible that the initial install didn't finish completely, as I said it froze or something at a certain point and I killed it after 24+ hours (I did run it with nohup using the --nodata flag), but the install checks worked. There are a couple of potentially temporary directories that didn't get cleaned up, "tmpbcbio-install" and "bcbiotx" are both present in addition to "bcbio".
It looks like the line that was causing problems was just looking for the tool directory, so I've hard-coded it in and it seems to be working now (it's currently downloading hg38.fa.gz with wget). I'll let you know if there are further problems. Thanks for all your help...
David
Greetings Sergey @naumenko-sa
Here's the next error:
Upgrading bcbio-nextgen data files
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}], 'genome_indexes': ['bwa', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full
Moving on to next genome prep method after trying ggd
GGD recipe not available for hg38 rtg
Downloading genome from s3: hg38 rtg
Moving on to next genome prep method after trying s3
No pre-computed indices for hg38 rtg
Preparing genome hg38 with index rtg
Moving on to next genome prep method after trying raw
Command 'export RTG_JAVA_OPTS='-Xms1g' && export RTG_MEM=2g && rtg format -o rtg/hg38.sdf /wynton/home/slee/dsiegel/tools/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa' returned non-zero exit status 127.
Traceback (most recent call last):
File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in
Hi @david-a-siegel !
It looks like the reference genome has been downloaded: /wynton/home/slee/dsiegel/tools/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa
but there is no rtg
command available (conda installation has not been finished?), so it can't make an rtg index for validation runs.
Can you check which rtg
, conda list rtg
and rtg --version
?
It should be installed in bcbio/anaconda/bin/rtg
.
Sergey
Hi @yavit1!
Thanks for reporting!
Sorry, I did not get the result of your mamba step - did it finish successfully, did it finish halfway, or did not finish at all?
Conda solve can take a lot of time (hours) - make sure your terminal session is not freezing during that time - use nohup or a batch job. I just have had a successful (well, with one issue) installation described here:
3459
Also, conda solve can require a lot of RAM - I had installation issues with 2G session, I have increased it to 20G, it helped.
We are rarely installing the data, since we are maintaining and reusing it for years, but I'm trying to reproduce your issue with a fresh data installation.
Sergey
Please advice how do I increase the RAM allocation for the install?
It depends on how you are running the installation.
If you are working in an interactive session (srun in slurm), try srun --mem=20G --pty /bin/bash
).
If you are submitting an sbatch job (slurm) try also #SBATCH --mem=20G
. Other batch systems also have similar parameters.
@david-a-siegel if you think that your installation did not finish properly and you have killed the script - just restart it and conda should continue the installation. There are several environments to deploy.
Thanks @naumenko-sa. I did find that rtg was not installed -- there is no folder for it and your which/conda list/--version commands came up empty. I tried to reinstall it and just let it run yesterday.
Your suggestion is to re-run the command when it times out? I'm trying it now.
nohup python3 bcbio_nextgen_install.py bcbio --tooldir=bcbio/tools --nodata &
Thanks,
David
@naumenko-sa How can I figure the RAM size requested by the installation process? Thank you
@yavit1
It depends on how you are running the installation. If you are working in an interactive session (srun in slurm), try srun --mem=20G --pty /bin/bash). If you are submitting an sbatch job (slurm) try also #SBATCH --mem=20G. Other batch systems also have similar parameters.
@david-a-siegel See https://github.com/bcbio/bcbio-nextgen/issues/3462 had a successful install. Sometimes it is conda server's timeouts.
@naumenko-sa Mine has failed even after reproducing #3462 . I'm trying to run it on ubuntu 16.04 LTS Memory 8Gb 64-bit Disk 60.4 GB Is worth the effort to do it in such a compact setting?
....................tzdata 2021a he74cb21_0 conda-forge/noarch Cached
wheel 0.36.2 pyhd3deb0d_0 conda-forge/noarch Cached
xz 5.2.5 h516909a_1 conda-forge/linux-64 Cached
zlib 1.2.11 h516909a_1010 conda-forge/linux-64 Cached
Summary:
Install: 33 packages
Total download: 0 B
─────────────────────────────────────────────────────────────────────────────────────
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Out of memory allocating 1221361016 bytes!
Killed
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Thanks @naumenko-sa. I will try to re-run it a few times over the next couple days. The machine has 512 GiB of RAM so I don't think that's the issue for me. I can't run a batch job because our compute nodes don't connect to the internet, only the dev nodes. Do you have another way to install and run the software?
@david-a-siegel @yavit1 I have installed 1.2.8 yesterday - conda sometimes freezes but re-running helps to pick it up from the last successful transaction. I don't see any errors, it seems that our environments just became too heavy (too many packages, total size = 36G). It is time to clean them up as suggested https://github.com/chapmanb/cloudbiolinux/pull/341
@yavit1 Unfortunately, 8G RAM could be not enough for conda solves. Also if you are going to run analyses on that node, there is not much you can do - you will need to run 1-threaded analyses (4G/core is a min). Still, if you just experimenting with bcbio, you could try running variant2 bwa/vardict analyses using chr22. RNA-seq STAR needs 30G RAM min. HDD limitation is also crucial: --nodata installation currently is 36G and the references would add more Gb. hg38/seq = 3.2G, hg38/snpeff=1.8G, hg38/variation = 260G. Even if you fit bcbio into this machine, you won't have the space for input data (10X GB) and work directory (tmp files could be 50GB-1T depending on the project size).
I was able to install bcbio-rnaseq and bcbio-vc from Docker. Then I went ahead to get the hg38 genome (I had to create the
genomes
directory manually). The genome was successfully downloaded, however, I ran on the same issue withrtg
as @david-a-siegelsubprocess.CalledProcessError: Command 'export PATH=/usr/local/bin:$PATH && export RTG_JAVA_OPTS='-Xms1g' && export RTG_MEM=2g && rtg format -o rtg/hg38.sdf /mnt/biodata/genomes/Hsapiens/hg38/seq/hg38.fa' returned non-zero exit status 127.
rtg
is missing in /usr/local/share/bcbio-nextgen/anaconda/bin
@naumenko-sa An update: I actually got the first step of the installation to finish without hanging. It just took a very long time. Here's what I did:
default_threads: 4
Now I'm running bcbio_nextgen.py upgrade -u skip --genomes hg38 --aligners bwa
@david-a-siegel !
Thanks for debugging! Glad it worked. 35h is definitely too much.
I switched back to conda instead of mamba - the install took 2h without data and not stalling.
Those who prefer mamba could do it with --mamba
option.
@yavit1 dockers, bcbio_vm are off for now.
Hi @naumenko-sa
Back at it. I tried to upgrade bcbio using:
bcbio_nextgen.py upgrade -u skip --genomes hg38 --aligners bwa
At some point there was an error:
Upgrading bcbio
Upgrading bcbio-nextgen data files
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}], 'genome_indexes': ['bwa', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full
Running GGD recipe: hg38 seq 1000g-20150219_1
Running GGD recipe: hg38 bwa 1000g-20150219
Traceback (most recent call last):
File "/wynton/home/slee/dsiegel/tools/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in
Do you know what this error might mean? It did download hg38, but some of the other files and directories don't appear to have finished.
Thanks!
Thanks for reporting @david-a-siegel !
I think I see, why: https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg38/bwa.yaml#L14
Please try again!
SN
Hi @naumenko-sa
It says it completed without error, but it didn't download all the files in that script. It downloaded the hg38.fa.sa, .bwt, .alt, .ann, and .pac files, but not the .ann or .amb (it created a file with the *.amb name but downloaded zero bytes). I downloaded them manually -- will let you know if there's another error.
David
Greetings. Sorry this is turning into a headache.
I'm trying to install bcbio on a linux cluster. I'm having trouble getting the install working. After some messing around (I didn't realize at first that anaconda had to be completely deactivated) I deactivated anaconda then installed bcbio: python bcbio_nextgen_install.py /bcbio --tooldir=/bcbio/tools --nodata
I think everything installed, but it didn't quit, even after a couple days. Eventually I killed the process.
If Anaconda is deactivated and I run "which bcbio_nextgen.py", I get "no bcbio_nextgen.py in (various folders)"
If I activate Anaconda and use my base env, it finds the file and properly outputs a version number.
Then I tried to install data and got this error (substituting "[etc]" for the rest of the path):
(base) [dsiegel@dev2 tools]$ bcbio_nextgen.py upgrade -u skip --genomes hg38 --aligners bwa Upgrading bcbio Upgrading bcbio-nextgen data files Traceback (most recent call last): File "[etc]/anaconda2/bin/bcbio_nextgen.py", line 221, in
install.upgrade_bcbio(kwargs["args"])
File "[etc]/anaconda2/lib/python2.7/site-packages/bcbio/install.py", line 106, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "[etc]/anaconda2/lib/python2.7/site-packages/bcbio/install.py", line 346, in upgrade_bcbio_data
cbl_genomes = import("cloudbio.biodata.genomes", fromlist=["genomes"])
File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 690
cmd= f"bismark_genome_preparation ."
^
SyntaxError: invalid syntax
So I went in and removed the f before the double-quote (here and in a few other places that gave errors). I'm guessing this is a python2 vs python3 issue. The same thing happens whether I run the first install command with python or python3.
Next error:
Upgrading bcbio Upgrading bcbio-nextgen data files List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'name': 'Human (hg38) full', 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'dbkey': 'hg38', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2']}], 'install_uniref': False}'): Human (hg38) full Running GGD recipe: hg38 bwa 1000g-20150219 Traceback (most recent call last): File "[etc]/anaconda2/bin/bcbio_nextgen.py", line 221, in
install.upgrade_bcbio(kwargs["args"])
File "[etc]/anaconda2/lib/python2.7/site-packages/bcbio/install.py", line 106, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "[etc]/anaconda2/lib/python2.7/site-packages/bcbio/install.py", line 348, in upgrade_bcbio_data
args.cores, ["ggd", "s3", "raw"])
File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local
_prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir)
File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 875, in _install_with_ggd
ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid)
File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
recipe["recipe"]["full"]["recipe_type"], system_install)
File "[etc]/tools/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
subprocess.check_output(["bash", run_file])
File "[etc]/anaconda2/lib/python2.7/subprocess.py", line 223, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '[etc]/genomes/Hsapiens/hg38/txtmp/ggd-run.sh']' returned non-zero exit status 8
I'm really not sure what's going on here, any advice would be welcome.
Thanks,
David