Closed caddymob closed 10 years ago
Jason;
Thanks much for bringing this up. I added more general support for Torque parameters and also translate -r account=youracct
into the -A
flag so hopefully it should do the right thing now when passed resources with the -r
flag. Thanks again and please let us know if you run into any problems.
https://bcbio-nextgen.readthedocs.org/en/latest/contents/parallel.html#ipython-parallel
Awesome Brad! I gave it a spin today and it doesn't seem to be quite working... I launched a job like this:
bcbio_nextgen.py ../config/nuke_gatk.yaml -t ipython -n 192 -s torque -q batch --tag nuke_gatk -r account=youracctnotmine
The launcher seems to know the command:
2014-02-04 19:10:49.101 [IPClusterStart] {'BcbioTORQUEEngineSetLauncher':
{'mem': '40.2', 'cores': 16, 'tag': 'nuke_gatk', 'resources': 'account= youracctnotmine'},
'IPClusterEngines': {'early_shutdown': 240},
'Application': {'log_level': 10},
'ProfileDir': {'location': u'/scratch/jcorneveaux/LETS_ROK/BAMS/NUKE/nuke_gatk_work/log/ipython'},
BaseParallelApplication': {'log_to_file': True, 'cluster_id': u'ba135043-7222-4957-af20-8ad4866c790d'},
'TORQUELauncher': {'queue': 'batch'},
'BcbioTORQUEControllerLauncher': {'mem': '40.2', 'tag': 'nuke_gatk', 'resources': 'account= youracctnotmine'},
'IPClusterStart': {'delay': 10, 'controller_launcher_class': u'cluster_helper.cluster.BcbioTORQUEControllerLauncher', 'daemonize': True, 'engine_launcher_class': u'cluster_helper.cluster.BcbioTORQUEEngineSetLauncher', 'n': 12}}
When I look at the jobs with checkjob
they are still to my default account and the torque_engines and torque_controller PBS scripts have no -A youracctnotmine
- They have:
#!/bin/sh
#PBS -q batch
#PBS -V
#PBS -N bcbio-nuke_gatk-ipcontroller
#PBS -j oe
#PBS -l walltime=239:00:00
and
#!/bin/sh
#PBS -q batch
#PBS -V
#PBS -j oe
#PBS -N bcbio-nuke_gatk-ipengine
#PBS -t 1-12
#PBS -l nodes=1:ppn=16
#PBS -l mem=41164mb
#PBS -l walltime=239:00:00
Jason; That's strange, I can't replicate this. Doing:
bcbio_nextgen.py input.yaml -t ipython -s torque -q batch -r account=youracctnotmine
Gives me:
#PBS -q batch
#PBS -V
#PBS -j oe
#PBS -N bcbio-ipengine
#PBS -t 1-1
#PBS -l nodes=1:ppn=1
#PBS -l mem=1228mb
#PBS -A youracctnotmine
#PBS -l walltime=239:00:00
Is it possible your version of ipython-cluster-helper did not get upgraded? This needs 0.2.9 What does:
ls -lhd /path/to/bcbio/anaconda/lib/python2.7/site-packages/*cluster*
give? Perhaps you have a manually installed version that is overriding 0.2.9? Hope this clears it up and gets things working.
Hmmm... still on 0.2.8 ?
ls -lhd /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/*cluster*
drwxr-xr-x 3 jpeden domainuser 106 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/cluster_helper
drwxr-xr-x 2 jpeden domainuser 4.0K 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/ipython_cluster_helper-0.2.8-py2.7.egg-info
-rw-r--r-- 1 jpeden domainuser 331 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/ipython_cluster_helper-0.2.8-py2.7-nspkg.pth
cat provenance/programs.txt
bcbio-nextgen,0.7.7a-94393bc
htseq,0.5.4p5
bamtofastq,0.0.107
bamtools,2.3.0
bcftools,0.2.0-rc5
bedtools,2.18.1
bowtie2,2.1.0
bwa,0.7.5a
cufflinks,v2.1.1
cutadapt,1.2.1
fastqc,0.10.1
freebayes,v0.9.10-3-g47a713e-dirty
gemini,0.6.4
novosort,V1.02.01
novoalign,3.02.00
samtools,0.1.19
sambamba,0.4.4
qualimap,0.7.1
tophat,v2.0.9
vcflib,2013-12-18
bcbio.variation,0.1.2
gatk,2.8-1-g932cd3a
mutect,1.1.5
picard,1.96
rnaseqc,_v1.1.7
snpeff,3.4e
varscan,v2.3.6
oncofuse,
Jason; Looks like it. I'm not sure how you upgraded, but if you do:
/packages/bcbio/0.7.4/anaconda/bin/pip install --upgrade ipython-cluster-helper
You should get the latest with the fixes. Hope this does it for you.
We got the latest code this AM (2 days after your commit) whilst rm -rf'in the rat rnaseq dir and running an upgrade as per our morning chats on pull #282. We ran several updates today, were (are?) already on development branch but here were the updates we did in this order:
rm -rf genomes/Rnorvegicus/rn5/rnaseq*
bcbio_nextgen.py upgrade --data
bcbio_nextgen.py upgrade --data --genomes rn5 --aligners star
rm -rf genomes/rn5/star
bcbio_nextgen.py upgrade --data --genomes rn5 --aligners star
#STAR currently indexing rn5 with clever implementation of ':(){ :|:& };:' on headnode # I tease but yea RAM HAWG
Perhaps you can clarify, if we are already on the development branch, when running bcbio_nextgen.py upgrade
do we need to specifying -u development
? Taking this as example, it seems we got the part of the code to handle the account=string, but somehow iPython 0.2.9 didn't come along for the ride. Did we need a --tools
in there?
Assuming my amigo doesn't crash the headnode will ask for the pip install
=) and let you know how she goes. I have 384 cores hummin right that I wanted to put on youracctnotmine
but I know we'll get it :+1:
Jason; Right, you would need a:
bcbio_nextgen.py upgrade -u development
The other upgrades handle the data but the code is not automatically upgraded. It will only grab that when you explicitly ask for it (to prevent upgrading code when you only want to grab new data). Sorry, I know it can get a bit confusing with all of the parts. Either a pip install
or, preferably, the bcbio_nextgen.py upgrade -u development
should get you there. Thanks much.
Thanks Brad for the clarification. While on that note, if jobs are currently running with whatever version of bcbio we have - is bcbio_nextgen.py upgrade -u development
a bad idea? I suppose its best to let them finish, but what about just stopping them by killing the ipcontrollers and ipengines, running bcbio_nextgen.py upgrade -u development
and then re-starting...?
FYI re #282 & #284 rn5
STAR looks like its done and ready! (built on cluster headnode with only 24GB RAM, indexing used ~20GB real ~30 virt, 1 cpu & took about 5 hours. FWIW....)
~> grep -A5 Feb ./genomes/Rnorvegicus/rn5/star/Log.out
Feb 04 14:29:58 ... Starting to generate Genome files
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 0 "chr1" chrStart: 0
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 1 "chr2" chrStart: 290193408
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 2 "chr3" chrStart: 575406080
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 3 "chr4" chrStart: 759169024
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 4 "chr5" chrStart: 1007681536
--
Feb 04 14:34:28 ... finished processing splice junctions database ...
Writing genome to disk... done.
Number of SA indices: 5220679560
SA size in bytes: 21535303186
Feb 04 14:34:56 ... starting to sort Suffix Array. This may take a long time...
Number of chunks: 3; chunks size limit: 14213638872 bytes
Feb 04 14:35:27 ... sorting Suffix Array chunks and saving them to disk...
Feb 04 18:38:17 ... loading chunks from disk, packing SA...
Feb 04 18:41:17 ... writing Suffix Array to disk ...
Feb 04 19:04:27 ... Finished generating suffix array
Feb 04 19:04:28 ... starting to generate Suffix Array index...
0% 1% 3% 5% 7% 9% 11% 13% 15% 17% 19% 21% 22% 24% 26% 28% 30% 32% 34% 36% 38% 40% 42% 44% 45% 47% 49% 51% 53% 55% 57% 59% 61% 63% 65% 67% 68% 70% 72% 74% 76% 78% 80% 82% 84% 86% 88% 90% 91% 93% 95% 97% 99% done
Feb 04 19:39:35 ... writing SAindex to disk
Feb 04 19:39:40 ..... Finished successfully
DONE: Genome generation, EXITING
~>du -hc genomes/Rnorvegicus/rn5/star/*
16K star/chrLength.txt
72K star/chrNameLength.txt
60K star/chrName.txt
32K star/chrStart.txt
3.5G star/Genome
4.0K star/genomeParameters.txt
320K star/Log.out
21G star/SA
1.5G star/SAindex
5.5M star/sjdbInfo.txt
5.0M star/sjdbList.out.tab
25G total
Jason; The best approach is to stop them, do the upgrade, than restart. Just upgrading should work too but depending on when the upgrade happens it could cause some jobs to fail. It's always safe to just qdel the controllers and engines and do the necessary upgrades. Fingers crossed all will work cleanly.
Just to update, still not taking the account right for me... The parameter is still seen, just not getting written to the PBS scripts...
2014-02-06 18:45:09.995 [IPClusterStart] {'BcbioTORQUEEngineSetLauncher': {'mem': '40.2', 'cores': 16, 'tag': 'MORE_gatk', 'resources': 'account=YourAcctNotMine'}, 'IPClusterEngines': {'early_shutdown': 240}, 'Application': {'log_level': 10}, 'ProfileDir': {'location': u'/scratch/jcorneveaux/LETS_LOH/BAMS/MORE/gatk_std/log/ipython'}, 'BaseParallelApplication': {'log_to_file': True, 'cluster_id': u'd7548197-8b29-4184-9c93-9da67b4e3139'}, 'TORQUELauncher': {'queue': 'batch'}, 'BcbioTORQUEControllerLauncher': {'mem': '40.2', 'tag': 'MORE_gatk', 'resources': 'account=YourAcctNotMine'}, 'IPClusterStart': {'delay': 10, 'controller_launcher_class': u'cluster_helper.cluster.BcbioTORQUEControllerLauncher', 'daemonize': True, 'engine_launcher_class': u'cluster_helper.cluster.BcbioTORQUEEngineSetLauncher', 'n': 4}}
and the PBS -
#!/bin/sh
#PBS -q batch
#PBS -V
#PBS -j oe
#PBS -N bcbio-MORE_gatk-ipengine
#PBS -t 1-4
#PBS -l nodes=1:ppn=16
#PBS -l mem=41164mb
#PBS -l walltime=239:00:00
cd $PBS_O_WORKDIR
/packages/bcbio/0.7.4/anaconda/bin/python -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 50000); resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 50000); resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from IPython.parallel.apps.ipengineapp import launch_new_instance; launch_new_instance()' --timeout=60 --IPEngineApp.wait_for_url_file=960 --EngineFactory.max_heartbeat_misses=100 --profile-dir="/scratch/jcorneveaux/LETS_ROK/BAMS/MORE/gatk_std/log/ipython" --cluster-id="d7548197-8b29-4184-9c93-9da67b4e3139"
bcbio-nextgen,0.7.7a-e91123c
and ...
ls -ldh /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/*cluster*
drwxr-xr-x. 3 jpeden galaxy 106 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/cluster_helper
drwxr-xr-x. 2 jpeden galaxy 4.0K 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/ipython_cluster_helper-0.2.8-py2.7.egg-info
-rw-r--r--. 1 jpeden galaxy 331 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/ipython_cluster_helper-0.2.8-py2.7-nspkg.pth
Jason; You'll need ipython-cluster-helper 0.2.9 for this to work. Did you try:
/packages/bcbio/0.7.4/anaconda/bin/pip install --upgrade ipython-cluster-helper
My apologies Brad, I had my version numbers mixed and thought we got what we needed with the e91123c upgrade. Will ping mi amigo with write perms and start charging YourAcctNotMine
;)
We did it!! Thanks again and apologies for my mixup.
Nice one. Thanks for testing and confirming this. Glad it's rolling now.
Another torque request :+1:
Our informaticians are going need to be able to submit bcbio-nextgen jobs against other accounts... Currently it takes my default account but I can debit another with
qsub -A youracctnotmine ...
or#PBS -A youracctnotmine
in a script.Similar to the
--tags
option, can we add this functionality, perhaps-A account_string --acct account_string
? Our cluster has several dozen accounts (each specific to a lab/grant/project) and a single bcbio wizard may run analyses for a variety of accounts. By default (and this I am unclear exactly how it's set, but probably simply by userID) if said submitter leaves out the-A account_string
to qsub, their default (ie PI's) account is debited the time.The default account gets used regardless of submission account. In otherwords, if I checkout an interactive node to a given accut:
qsub -I -A youracctnotmine ...
and from there launch abcbio_nextgen.py -t ipython -s torque ...
from my non-default account, the ipcontroller & ipengine jobs launch against my default account rather thanyouracctnotmine
.I tried this with the
-r
option but this seems to be a nuance difference between sqe & torque.Many thanks for even reading my verbosity =) Happy to clarify or help if I can!