genome / gms

The Genome Modeling System installer
https://github.com/genome/gms/wiki
GNU Lesser General Public License v3.0
78 stars 22 forks source link

Setting up GGMS at Southwood #65

Closed obigriffith closed 10 years ago

obigriffith commented 10 years ago

This issue will retain notes for installation of GMS on a completely external box with consumer hardware. Specifically: a DELL T7600, 64GB ram, 64-bit, dual quad core (XEON E5-2609) @ 2.4GHz, 1 2TB 7200rpm 6Gb/s hd (sda), 1 3TB 7200rpm 6Gb/s hd (sdb), 1 raid1 (2x2TB) 7200rpm 6Gb/s hd (sdc).

Ubuntu 12.04.3 LTS (PRECISE) was installed from disk. http://releases.ubuntu.com/precise/

For 64-bit processor (recommended) use: http://releases.ubuntu.com/precise/ubuntu-12.04.3-desktop-amd64.iso For 32-bit processor use: http://releases.ubuntu.com/precise/ubuntu-12.04.3-desktop-i386.iso

Installation options: 1) Download updates while installing (optional) 2) Install third-party software (optional but unnecessary) 3) Installation type - choose "Something else"

If installing for first time, create 'New Partition Table'. Otherwise change/edit/format as needed. Drives were configured as follows: sda - single partition: sda1 ext4 /tmp [format] sdb - single partition: sdb1 ext4 /opt/gms [do not format to avoid re-download of data] sdc - three partitions: sdc1 EFI boot (200MB) sdc2 swap (64GB) sdc3 ext4 / (ubuntu installed here) [format] sdc = Device for boot loader installation

Set name, computer name, username, and password. Do not encrypt home folder. For convenience choose a username that together with local domain name is a valid email address. hostname: GGMS

git, ssh, make, etc were installed as per install instructions here: https://github.com/genome/gms sudo apt-get install git ssh make vim byobu

Home router settings: port 22 was forwarded to the internal IP of this box to allow ssh connection through external IP

obigriffith commented 10 years ago

In order to clone and be able to commit to GMS repo: Login to github.com, go to profile, SSH keys, Add SSH key. Follow instructions at: https://help.github.com/articles/generating-ssh-keys

Clone git repo: git clone git@github.com:genome/gms.git

Install system: byobu cd gms make

If planning to run the small test datasets and physical resources are limited (e.g., only 8 cpus and 64gb ram: 1) Set 'WF_LOW_RESOURCES=1' and GENOME_USER_EMAIL in /etc/genome.conf 2) Set MXJ value for 'default' host to a larger number (e.g., to 48) instead of '!' in /opt/openlava-2.2/etc/lsb.hosts 3) Set $job->resource->mem_request (e.g. to total_mem/48) in /Workflow/Dispatcher/Lsf.pm

After make finishes successfully, and you have made the changes above, reboot or send reboot signal (if working remotely): sudo reboot

obigriffith commented 10 years ago

Next, perform the 'Initial Sanity Checks' outlined as per install instructions: https://github.com/genome/gms

Similarly, follow 'Usage' instructions to download, import, and test importation of meta-data.

If you used the downsampled data in a previous installation run: /home/ogriffit/gms/setup/restore_original_tst1_data.pl

Finally, download the GMS1 data with: genome sys gateway attach GMS1 --protocol ftp --rsync

The first time you do this, on a home internet connection it will likely take days. Another option is to transfer by hardrive this data to /opt/gms. Once it has been downloaded once we will want to keep the drive with /opt/gms and just remount if a re-installation of OS becomes necesssary. The command above should then be run again to check for any new/changed data files but rsync will let you skip past all previously downloaded and unchanged files.

To reset the database to original set of model data: make db-rebuild

To run on the smaller set of subsampled (1/100th or 1/1000th) data run: /home/ogriffit/gms/setup/use_sampled_tst1_data.pl --ds=100

Finally, you can start attempting to build the test models. The microarray models are relatively fast and can be run concurrently. Depending on the specs of your box, all other jobs will probably need to run one at a time. Possibly both exome refaligns can run concurrently. RNAseq can be run before refaligns or vice versa. Obviously differential expression requires first successful tumor and normal RNAseq builds. Similarly somatic variation builds require the relevant tumor and normal refaligns to complete first. Until further configuration is made possible it may be necessary to bmod down resources to get some jobs to complete.

obigriffith commented 10 years ago

It is a good idea to run in a screen. If you forget and want to transfer a running job to a screen session, this blog has useful instructions on how to do so. http://www.jaredlog.com/?p=1981

obigriffith commented 10 years ago

For RNAseq builds, only one (tumor or normal) can be run at a time and 'per-lane-tophat_2' step will stall with insufficient resources. 'bjobs' will show a pending job and 'bjobs -l' will show the pending reason. Use bmod to reset ncpus from 12 to 8 which is all that are available on the box. This should be made a configurable option that a user sets are install.

obig@GGMS ~> bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 432 obig RUN normal GGMS GGMS 42a2/logs Jan 13 22:07 433 obig RUN normal GGMS GGMS Alignment Jan 13 22:07 435 obig PEND alignment GGMS *-tophat_2 Jan 14 01:15

bjobs -l 435

Job <435>, Job Name , User , Project <buildd038161f003 f4c3d9da5d34fe42b42a2>, Status , Queue , Command <prefix-and-tee-output /opt/gms/LNXE927/fs/LNXE927 /info/model_data/92c6c18f373842589b59c7b942476d7f/buildd03 8161f003f4c3d9da5d34fe42b42a2/logs/4255235fba5f481987eb366 a6fc23d82/3e877f92cc72482a91c0be56cd654206/0bff6233f7c9420 384e80fb1950a15a0.out /opt/gms/L> Tue Jan 14 01:15:47: Submitted from host , CWD <$HOME>, Output File </opt /gms/LNXE927/fs/LNXE927/info/model_data/92c6c18f373842589b 59c7b942476d7f/buildd038161f003f4c3d9da5d34fe42b42a2/logs/ 4255235fba5f481987eb366a6fc23d82/3e877f92cc72482a91c0be56c d654206/0bff6233f7c9420384e80fb1950a15a0.out>, Error File </opt/gms/LNXE927/fs/LNXE927/info/model_data/92c6c18f37384 2589b59c7b942476d7f/buildd038161f003f4c3d9da5d34fe42b42a2/ logs/4255235fba5f481987eb366a6fc23d82/3e877f92cc72482a91c0 be56cd654206/0bff6233f7c9420384e80fb1950a15a0.err>, Reques ted Resources <select[ncpus>=12 && mem>=28672] span[hosts= 1] rusage[mem=28672]>;

MEMLIMIT 29360128 K PENDING REASONS: Job's resource requirements not satisfied; Job's resource requirements not satisfied;

obig@GGMS ~> bmod -R 'select[ncpus>=8 && mem>=28672] span[hosts=1] rusage[mem=28672]' 435

malachig commented 10 years ago

These tests are still underway. Currently we are testing the latest iteration of the small test data set (a strategically selected subset of the HCC1395 data)

malachig commented 10 years ago

This testing was successful. Complete analyses were run on the GMS installed at a residential home on consumer hardware where all data was downloaded over a personal cable internet service. This is a proof-of-principle that a citizen scientist could install the GMS and use it for their own whole genome, exome and transcriptome data analysis.

Closing this issue.