Closed obigriffith closed 10 years ago
In order to clone and be able to commit to GMS repo: Login to github.com, go to profile, SSH keys, Add SSH key. Follow instructions at: https://help.github.com/articles/generating-ssh-keys
Clone git repo: git clone git@github.com:genome/gms.git
Install system: byobu cd gms make
If planning to run the small test datasets and physical resources are limited (e.g., only 8 cpus and 64gb ram: 1) Set 'WF_LOW_RESOURCES=1' and GENOME_USER_EMAIL in /etc/genome.conf 2) Set MXJ value for 'default' host to a larger number (e.g., to 48) instead of '!' in /opt/openlava-2.2/etc/lsb.hosts 3) Set $job->resource->mem_request (e.g. to total_mem/48) in /Workflow/Dispatcher/Lsf.pm
After make finishes successfully, and you have made the changes above, reboot or send reboot signal (if working remotely): sudo reboot
Next, perform the 'Initial Sanity Checks' outlined as per install instructions: https://github.com/genome/gms
Similarly, follow 'Usage' instructions to download, import, and test importation of meta-data.
If you used the downsampled data in a previous installation run: /home/ogriffit/gms/setup/restore_original_tst1_data.pl
Finally, download the GMS1 data with: genome sys gateway attach GMS1 --protocol ftp --rsync
The first time you do this, on a home internet connection it will likely take days. Another option is to transfer by hardrive this data to /opt/gms. Once it has been downloaded once we will want to keep the drive with /opt/gms and just remount if a re-installation of OS becomes necesssary. The command above should then be run again to check for any new/changed data files but rsync will let you skip past all previously downloaded and unchanged files.
To reset the database to original set of model data: make db-rebuild
To run on the smaller set of subsampled (1/100th or 1/1000th) data run: /home/ogriffit/gms/setup/use_sampled_tst1_data.pl --ds=100
Finally, you can start attempting to build the test models. The microarray models are relatively fast and can be run concurrently. Depending on the specs of your box, all other jobs will probably need to run one at a time. Possibly both exome refaligns can run concurrently. RNAseq can be run before refaligns or vice versa. Obviously differential expression requires first successful tumor and normal RNAseq builds. Similarly somatic variation builds require the relevant tumor and normal refaligns to complete first. Until further configuration is made possible it may be necessary to bmod down resources to get some jobs to complete.
It is a good idea to run in a screen. If you forget and want to transfer a running job to a screen session, this blog has useful instructions on how to do so. http://www.jaredlog.com/?p=1981
For RNAseq builds, only one (tumor or normal) can be run at a time and 'per-lane-tophat_2' step will stall with insufficient resources. 'bjobs' will show a pending job and 'bjobs -l' will show the pending reason. Use bmod to reset ncpus from 12 to 8 which is all that are available on the box. This should be made a configurable option that a user sets are install.
obig@GGMS ~> bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 432 obig RUN normal GGMS GGMS 42a2/logs Jan 13 22:07 433 obig RUN normal GGMS GGMS Alignment Jan 13 22:07 435 obig PEND alignment GGMS *-tophat_2 Jan 14 01:15
bjobs -l 435
Job <435>, Job Name
MEMLIMIT 29360128 K PENDING REASONS: Job's resource requirements not satisfied; Job's resource requirements not satisfied;
obig@GGMS ~> bmod -R 'select[ncpus>=8 && mem>=28672] span[hosts=1] rusage[mem=28672]' 435
These tests are still underway. Currently we are testing the latest iteration of the small test data set (a strategically selected subset of the HCC1395 data)
This testing was successful. Complete analyses were run on the GMS installed at a residential home on consumer hardware where all data was downloaded over a personal cable internet service. This is a proof-of-principle that a citizen scientist could install the GMS and use it for their own whole genome, exome and transcriptome data analysis.
Closing this issue.
This issue will retain notes for installation of GMS on a completely external box with consumer hardware. Specifically: a DELL T7600, 64GB ram, 64-bit, dual quad core (XEON E5-2609) @ 2.4GHz, 1 2TB 7200rpm 6Gb/s hd (sda), 1 3TB 7200rpm 6Gb/s hd (sdb), 1 raid1 (2x2TB) 7200rpm 6Gb/s hd (sdc).
Ubuntu 12.04.3 LTS (PRECISE) was installed from disk. http://releases.ubuntu.com/precise/
For 64-bit processor (recommended) use: http://releases.ubuntu.com/precise/ubuntu-12.04.3-desktop-amd64.iso For 32-bit processor use: http://releases.ubuntu.com/precise/ubuntu-12.04.3-desktop-i386.iso
Installation options: 1) Download updates while installing (optional) 2) Install third-party software (optional but unnecessary) 3) Installation type - choose "Something else"
If installing for first time, create 'New Partition Table'. Otherwise change/edit/format as needed. Drives were configured as follows: sda - single partition: sda1 ext4 /tmp [format] sdb - single partition: sdb1 ext4 /opt/gms [do not format to avoid re-download of data] sdc - three partitions: sdc1 EFI boot (200MB) sdc2 swap (64GB) sdc3 ext4 / (ubuntu installed here) [format] sdc = Device for boot loader installation
Set name, computer name, username, and password. Do not encrypt home folder. For convenience choose a username that together with local domain name is a valid email address. hostname: GGMS
git, ssh, make, etc were installed as per install instructions here: https://github.com/genome/gms sudo apt-get install git ssh make vim byobu
Home router settings: port 22 was forwarded to the internal IP of this box to allow ssh connection through external IP