bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
985 stars 353 forks source link

HPC Sun Grid Engine Bcbio installation #1378

Closed mortunco closed 8 years ago

mortunco commented 8 years ago

Hi,

I tried to install bcbio_nextgen.py to our HPC. I installed bcbio to a folder called bcbio in my scratch directory.( not in usr/loca/share as it was told in documentation). After installation, we couldnt get the system run after we change some of the python paths from /anaconda1anaconda2anaconda3/ to our bcbio annotation/python. However, we encountered with problems;

  1. When we enter bcbio_nextgen.py to the command line, the response came after 2-3 minutes. Whereas in amazon this action finales in 5-10 seconds. How can we debug it?
  2. Is there a way to automatically set python path for the install ? or do we have to manually edit the python paths in the files ? If yes, could you point out which files should I change ?
  3. We decided to use your chr 6 teaching example a control of the system. Therefore, we need hg38 different than canonical installation? Therefore, at the --genome part how should I state that I need both Grch37 and hg38 to be installed?

Thank you very much,

Best,

Tunc.

chapmanb commented 8 years ago

Tunc; Sorry about the issues. Trying to tackle the points one at a time:

  1. It sounds like the shared filesystem where you installed bcbio is running slowly. I've seen this slow startup time on heavily utilized systems where it took a long time to load all the python files and libraries used by bcbio. Moving to a more responsive shared filesystem on your HPC will hopefully improve this.
  2. Why did you manually edit python paths? This will, as you experienced, break the ability of the installation to find the installed python. bcbio installs it's own isolated python and doesn't need a system python. We don't recommend manual editing of files.
  3. You pass multiple flags to the installation: bcbio_nextgen.py upgrade --genomes GRCh37 --genomes hg38

Hope this helps.

mortunco commented 8 years ago

Dear @chapmanb;

I came across with file problem in the ftp server during installation. I think there is a problem in the file location in the ftp server.

Also, I share my environment variables in my .bashrc . Are these enough? Do I have to add anything else to my .bashrc ??

My next challenge is to bring bcbio to my school's server. I believe I can introduce bcbio to my school.

Best regards, Tunc.

export PATH=/share/apps/python-2.7.2/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/python-2.7.2/lib:$LD_LIBRARY_PATH

export PATH=/share/apps/gcc/gcc-4.6.2/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/gcc/gcc-4.6.2/lib64:/share/apps/gcc/gcc-4.6.2/lib:/usr/lib:/usr/lib64:$LD_LIBRARY_PATH
export LC_ALL=en_US.UTF-8
export PYTHONPATH=/mnt/scratch/tmorova15/bcbio/anaconda/bin

#CloudBioLinux PATH updates
export PATH=/mnt/scratch/tmorova15/bcbio/bin:$PATH

# CloudBioLinux PATH updates
Running GGD recipe: dbsnp
--2016-05-03 10:03:03--  ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/VCF/All_20150603.vcf.gz
           => `variation/dbsnp-144-orig.vcf.gz'
Resolving ftp.ncbi.nih.gov... 130.14.250.13, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nih.gov|130.14.250.13|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD /snp/organisms/human_9606_b144_GRCh38p2/VCF ...
No such directory `snp/organisms/human_9606_b144_GRCh38p2/VCF'.

Traceback (most recent call last):
  File "/mnt/kufs/scratch/tmorova15/bcbio/bin/bcbio_nextgen.py", line 4, in <module>
    __import__('pkg_resources').run_script('bcbio-nextgen==0.9.7', 'bcbio_nextgen.py')
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.7.0-py2.7.egg/pkg_resources/__init__.py", line 719, in run_script

  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.7.0-py2.7.egg/pkg_resources/__init__.py", line 1504, in run_script

  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-0.9.7-py2.7.egg-info/scripts/bcbio_nextgen.py", line 207, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 91, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 267, in upgrade_bcbio_data
    cbl_deploy.deploy(s)
  File "/mnt/kufs/scratch/tmorova15/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
    _setup_vm(options, vm_launcher, actions)
  File "/mnt/kufs/scratch/tmorova15/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
    configure_instance(options, actions)
  File "/mnt/kufs/scratch/tmorova15/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
    setup_biodata(options)
  File "/mnt/kufs/scratch/tmorova15/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
    install_proc(options["genomes"], ["ggd", "s3", "raw"])
  File "/mnt/kufs/scratch/tmorova15/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
    _prep_genomes(env, genomes, genome_indexes, ready_approaches)
  File "/mnt/kufs/scratch/tmorova15/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 474, in _prep_genomes
    retrieve_fn(env, manager, gid, idx)
  File "/mnt/kufs/scratch/tmorova15/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 796, in _install_with_ggd
    ggd.install_recipe(env.cwd, recipe_file)
  File "/mnt/kufs/scratch/tmorova15/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
    recipe["recipe"]["full"]["recipe_type"])
  File "/mnt/kufs/scratch/tmorova15/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
    subprocess.check_output(["bash", run_file])
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '/mnt/kufs/scratch/tmorova15/bcbio/genomes/Hsapiens/hg38/txtmp/ggd-run.sh']' returned non-zero exit status 1
chapmanb commented 8 years ago

Tunc; Sorry about the download issues. It looks like NCBI removed the references to dbSNP 144. I updated to the latest dbSNP 147 and things should now run cleanly if you remove the temporary CloudBioLinux:

rm -rf tmpbcbio-install

and re-run the install/update procedure. Thanks much for the report.