Closed drlaurenwasson closed 4 years ago
Hello Lauren @drlaurenwasson !
Yes, bcbio needs to install data: https://bcbio-nextgen.readthedocs.io/en/latest/contents/installation.html#install-data
In your case it could be achieved by running (with bcbio and bcbio tools in PATH):
# install reference genome for mm10
bcbio_nextgen.py -u skip --genomes mm10 --aligners bwa --cores 10
# install RNA-annotation for mm10
bcbio_nextgen.py -u skip --genomes mm10 --datatarget rnaseq
# build STAR index for mm10
bcbio_nextgen.py -u skip --genomes mm10 --aligners star --cores 10
If the admins could follow up here with a particular installation error they see, we may try to resolve it.
Sergey
Thank you for your prompt reply @naumenko-sa
I sent them an email and got this response: "A couple of thoughts.
• Is this something you need to install/download once (and then you'll always have it already installed/dowloaded)? If so, you can add something to your path for just one session (so not in your .bashrc to load every time):
export PATH=
It doesn't sound like this one of those."
I wonder if I could install the data on my personal directory (keeping in mind I only have 50GB) after loading the module without having to go through them if necessary? I pointed them to this github page to troubleshoot the error, but we shall see.
I'm also a relative newbie in python. Can you explain what I would need to do to get "with bcbio and bcbio tools in PATH"
Thank you for your help
You can't really analyze NGS data with 50GB. Some clusters provide extended disk space for users outside of /home/user. Maybe you need to apply for that. Once you have space, you can install bcbio on your own. mm10 files require 20-50G depending on what exactly you are installing as data targets.
In a cluster setting, a shared bcbio installation makes the most sense, because bcbio installs hundreds of bioinformatics packages via conda and most of the databases required for germline/somatic/RNA-seq NGS analyses and many other analyses, see all user stories supported in the docs. Many users could benefit from that one big shared bcbio instance.
By default, bcbio installation script modifies ~/.bashrc
(PATH variable).
When installing with
python bcbio_nextgen_install.py [bcbio_path] --tooldir=[bcbio_tools_path] --nodata --isolate
,
you have to add two directories in your PATH (~/.bashrc):
export PATH=/bcbio_path/anaconda/bin:/bcbio_path/tools/bin:$PATH
See more here:
https://bcbio-nextgen.readthedocs.io/en/latest/contents/installation.html#installation-parameters
SN
Version info
To Reproduce
!/bin/bash
SBATCH -p general
SBATCH --job-name=bcbionextgen
SBATCH -c 1
SBATCH -t 12:00:00
SBATCH --mem-per-cpu=10G
SBATCH -e bcbionextgen.err
bcbio_nextgen.py ../config/submission.yaml -n 64 -t ipython -s slurm -q general '-rW=100:00' --retries 3 --timeout 5000
details:
Observed behavior 2020-06-10 10:04:51.649 [IPClusterStart] Loaded config file: /nas/longleaf/home/lwaldron/RNA-seq/submission/work/log/ipython/ipengine_config.py 2020-06-10 10:04:51.649 [IPClusterStart] Looking for ipengine_config in /nas/longleaf/home/lwaldron/RNA-seq/submission/work 2020-06-10 10:04:51.650 [IPClusterStart] Attempting to load config file: ipcluster_6efe6e16_b020_4507_8783_9feb2389b159_config.py 2020-06-10 10:04:51.651 [IPClusterStart] Looking for ipcluster_config in /etc/ipython 2020-06-10 10:04:51.651 [IPClusterStart] Looking for ipcluster_config in /usr/local/etc/ipython 2020-06-10 10:04:51.651 [IPClusterStart] Looking for ipcluster_config in /nas/longleaf/apps/bcbio-nextgen/1.2.0/venv/anaconda/etc/ipython 2020-06-10 10:04:51.651 [IPClusterStart] Looking for ipcluster_config in /nas/longleaf/home/lwaldron/RNA-seq/submission/work/log/ipython 2020-06-10 10:04:51.652 [IPClusterStart] Loaded config file: /nas/longleaf/home/lwaldron/RNA-seq/submission/work/log/ipython/ipcluster_config.py 2020-06-10 10:04:51.652 [IPClusterStart] Looking for ipcluster_config in /nas/longleaf/home/lwaldron/RNA-seq/submission/work
Expected behavior Hello, My job has been stalled at this spot for 4 hours. If I look at the job scheduler I see this: 61199325 bcbionext+ general rc_fconlo+ 1 RUNNING 0:0 61199325.ba+ batch rc_fconlo+ 1 RUNNING 0:0 61199325.ex+ extern rc_fconlo+ 1 RUNNING 0:0
There is a SLURM_controller file which when opened looks like this:
!/bin/sh
SBATCH -p general
SBATCH -J bcbio-c
SBATCH -o bcbio-ipcontroller.out.%A_%a
SBATCH -e bcbio-ipcontroller.err.%A_%a
SBATCH -t 01-00:00:00
SBATCH --cpus-per-task=1
SBATCH -A rc_fconlon_pi
SBATCH --mem=4000
SBATCH --W=100:00
export IPYTHONDIR=/nas/longleaf/home/lwaldron/RNA-seq/submission/work/log/ipython /nas/longleaf/apps/bcbio-nextgen/1.2.0/venv/anaconda/bin/python -E -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 10240) if max_proc > 0 else 10240; resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 10240) if max_hdls > 0 else 10240; resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from cluster_helper.cluster import VMFixIPControllerApp; VMFixIPControllerApp.launch_instance()' --ip=* --log-to-file --profile-dir="/nas/longleaf/home/lwaldron/RNA-seq/submission/work/log/ipython" --cluster-id="6efe6e16-b020-4507-8783-9feb2389b159" --nodb --hwm=1 --scheme=leastload --HeartMonitor.max_heartmonitor_misses=720 --HeartMonitor.period=5000
Log files Please attach (10MB max):
bcbio-nextgen.log
,bcbio-nextgen-commands.log
, andbcbio-nextgen-debug.log
.Additional context This is the first time I am trying bcbio-nextgen on the UNC computing cluster. I brought it over from HMS, where I learned how to use it. The module was installed by UNC ITS, and I did get this email when I asked them to install it:
"The command in the installation doc for downloading genome data didnt work. None of the options specified in the doc are valid options for that command. Not sure if the user already has the necessary data. If they need this, they should probably contact the authors of this software to see how to get it"