bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

torque account string | #PBS -A account_string #280

Closed caddymob closed 10 years ago

caddymob commented 10 years ago

Another torque request :+1:

Our informaticians are going need to be able to submit bcbio-nextgen jobs against other accounts... Currently it takes my default account but I can debit another with qsub -A youracctnotmine ... or #PBS -A youracctnotmine in a script.

Similar to the --tags option, can we add this functionality, perhaps -A account_string --acct account_string ? Our cluster has several dozen accounts (each specific to a lab/grant/project) and a single bcbio wizard may run analyses for a variety of accounts. By default (and this I am unclear exactly how it's set, but probably simply by userID) if said submitter leaves out the -A account_string to qsub, their default (ie PI's) account is debited the time.

The default account gets used regardless of submission account. In otherwords, if I checkout an interactive node to a given accut: qsub -I -A youracctnotmine ... and from there launch a bcbio_nextgen.py -t ipython -s torque ... from my non-default account, the ipcontroller & ipengine jobs launch against my default account rather than youracctnotmine.

I tried this with the -r option but this seems to be a nuance difference between sqe & torque.

Many thanks for even reading my verbosity =) Happy to clarify or help if I can!

chapmanb commented 10 years ago

Jason; Thanks much for bringing this up. I added more general support for Torque parameters and also translate -r account=youracct into the -A flag so hopefully it should do the right thing now when passed resources with the -r flag. Thanks again and please let us know if you run into any problems.

https://bcbio-nextgen.readthedocs.org/en/latest/contents/parallel.html#ipython-parallel

caddymob commented 10 years ago

Awesome Brad! I gave it a spin today and it doesn't seem to be quite working... I launched a job like this:

bcbio_nextgen.py ../config/nuke_gatk.yaml -t ipython -n 192 -s torque -q batch --tag nuke_gatk -r account=youracctnotmine

The launcher seems to know the command:

2014-02-04 19:10:49.101 [IPClusterStart] {'BcbioTORQUEEngineSetLauncher': 
{'mem': '40.2', 'cores': 16, 'tag': 'nuke_gatk', 'resources': 'account= youracctnotmine'}, 
'IPClusterEngines': {'early_shutdown': 240}, 
'Application': {'log_level': 10}, 
'ProfileDir': {'location': u'/scratch/jcorneveaux/LETS_ROK/BAMS/NUKE/nuke_gatk_work/log/ipython'},
BaseParallelApplication': {'log_to_file': True, 'cluster_id': u'ba135043-7222-4957-af20-8ad4866c790d'}, 
'TORQUELauncher': {'queue': 'batch'},
 'BcbioTORQUEControllerLauncher': {'mem': '40.2', 'tag': 'nuke_gatk', 'resources': 'account= youracctnotmine'}, 
'IPClusterStart': {'delay': 10, 'controller_launcher_class': u'cluster_helper.cluster.BcbioTORQUEControllerLauncher', 'daemonize': True, 'engine_launcher_class': u'cluster_helper.cluster.BcbioTORQUEEngineSetLauncher', 'n': 12}}

When I look at the jobs with checkjob they are still to my default account and the torque_engines and torque_controller PBS scripts have no -A youracctnotmine - They have:

#!/bin/sh
#PBS -q batch
#PBS -V
#PBS -N bcbio-nuke_gatk-ipcontroller
#PBS -j oe
#PBS -l walltime=239:00:00

and

#!/bin/sh
#PBS -q batch
#PBS -V
#PBS -j oe
#PBS -N bcbio-nuke_gatk-ipengine
#PBS -t 1-12
#PBS -l nodes=1:ppn=16
#PBS -l mem=41164mb
#PBS -l walltime=239:00:00
chapmanb commented 10 years ago

Jason; That's strange, I can't replicate this. Doing:

bcbio_nextgen.py input.yaml -t ipython -s torque -q batch -r account=youracctnotmine

Gives me:

#PBS -q batch
#PBS -V
#PBS -j oe
#PBS -N bcbio-ipengine
#PBS -t 1-1
#PBS -l nodes=1:ppn=1
#PBS -l mem=1228mb
#PBS -A youracctnotmine
#PBS -l walltime=239:00:00

Is it possible your version of ipython-cluster-helper did not get upgraded? This needs 0.2.9 What does:

ls -lhd /path/to/bcbio/anaconda/lib/python2.7/site-packages/*cluster*

give? Perhaps you have a manually installed version that is overriding 0.2.9? Hope this clears it up and gets things working.

caddymob commented 10 years ago

Hmmm... still on 0.2.8 ?

ls -lhd /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/*cluster*
drwxr-xr-x 3 jpeden domainuser  106 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/cluster_helper
drwxr-xr-x 2 jpeden domainuser 4.0K 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/ipython_cluster_helper-0.2.8-py2.7.egg-info
-rw-r--r-- 1 jpeden domainuser  331 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/ipython_cluster_helper-0.2.8-py2.7-nspkg.pth
cat provenance/programs.txt
bcbio-nextgen,0.7.7a-94393bc
htseq,0.5.4p5
bamtofastq,0.0.107
bamtools,2.3.0
bcftools,0.2.0-rc5
bedtools,2.18.1
bowtie2,2.1.0
bwa,0.7.5a
cufflinks,v2.1.1
cutadapt,1.2.1
fastqc,0.10.1
freebayes,v0.9.10-3-g47a713e-dirty
gemini,0.6.4
novosort,V1.02.01
novoalign,3.02.00
samtools,0.1.19
sambamba,0.4.4
qualimap,0.7.1
tophat,v2.0.9
vcflib,2013-12-18
bcbio.variation,0.1.2
gatk,2.8-1-g932cd3a
mutect,1.1.5
picard,1.96
rnaseqc,_v1.1.7
snpeff,3.4e
varscan,v2.3.6
oncofuse,
chapmanb commented 10 years ago

Jason; Looks like it. I'm not sure how you upgraded, but if you do:

/packages/bcbio/0.7.4/anaconda/bin/pip install --upgrade ipython-cluster-helper

You should get the latest with the fixes. Hope this does it for you.

caddymob commented 10 years ago

We got the latest code this AM (2 days after your commit) whilst rm -rf'in the rat rnaseq dir and running an upgrade as per our morning chats on pull #282. We ran several updates today, were (are?) already on development branch but here were the updates we did in this order:

Perhaps you can clarify, if we are already on the development branch, when running bcbio_nextgen.py upgrade do we need to specifying -u development? Taking this as example, it seems we got the part of the code to handle the account=string, but somehow iPython 0.2.9 didn't come along for the ride. Did we need a --tools in there?

Assuming my amigo doesn't crash the headnode will ask for the pip install =) and let you know how she goes. I have 384 cores hummin right that I wanted to put on youracctnotmine but I know we'll get it :+1:

chapmanb commented 10 years ago

Jason; Right, you would need a:

bcbio_nextgen.py upgrade -u development

The other upgrades handle the data but the code is not automatically upgraded. It will only grab that when you explicitly ask for it (to prevent upgrading code when you only want to grab new data). Sorry, I know it can get a bit confusing with all of the parts. Either a pip install or, preferably, the bcbio_nextgen.py upgrade -u development should get you there. Thanks much.

caddymob commented 10 years ago

Thanks Brad for the clarification. While on that note, if jobs are currently running with whatever version of bcbio we have - is bcbio_nextgen.py upgrade -u development a bad idea? I suppose its best to let them finish, but what about just stopping them by killing the ipcontrollers and ipengines, running bcbio_nextgen.py upgrade -u development and then re-starting...?

FYI re #282 & #284 rn5 STAR looks like its done and ready! (built on cluster headnode with only 24GB RAM, indexing used ~20GB real ~30 virt, 1 cpu & took about 5 hours. FWIW....)

~> grep -A5 Feb  ./genomes/Rnorvegicus/rn5/star/Log.out
Feb 04 14:29:58 ... Starting to generate Genome files
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 0  "chr1" chrStart: 0
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 1  "chr2" chrStart: 290193408
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 2  "chr3" chrStart: 575406080
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 3  "chr4" chrStart: 759169024
/packages/bcbio/0.7.4/genomes/Rnorvegicus/rn5/seq/rn5.fa : chr # 4  "chr5" chrStart: 1007681536
--
Feb 04 14:34:28 ... finished processing splice junctions database ...
Writing genome to disk... done.
Number of SA indices: 5220679560
SA size in bytes: 21535303186
Feb 04 14:34:56 ... starting to sort  Suffix Array. This may take a long time...
Number of chunks: 3;   chunks size limit: 14213638872 bytes
Feb 04 14:35:27 ... sorting Suffix Array chunks and saving them to disk...
Feb 04 18:38:17 ... loading chunks from disk, packing SA...
Feb 04 18:41:17 ... writing Suffix Array to disk ...
Feb 04 19:04:27 ... Finished generating suffix array
Feb 04 19:04:28 ... starting to generate Suffix Array index...
0% 1% 3% 5% 7% 9% 11% 13% 15% 17% 19% 21% 22% 24% 26% 28% 30% 32% 34% 36% 38% 40% 42% 44% 45% 47% 49% 51% 53% 55% 57% 59% 61% 63% 65% 67% 68% 70% 72% 74% 76% 78% 80% 82% 84% 86% 88% 90% 91% 93% 95% 97% 99%  done
Feb 04 19:39:35 ... writing SAindex to disk
Feb 04 19:39:40 ..... Finished successfully
DONE: Genome generation, EXITING

~>du -hc genomes/Rnorvegicus/rn5/star/*
16K star/chrLength.txt
72K star/chrNameLength.txt
60K star/chrName.txt
32K star/chrStart.txt
3.5G    star/Genome
4.0K    star/genomeParameters.txt
320K    star/Log.out
21G star/SA
1.5G    star/SAindex
5.5M    star/sjdbInfo.txt
5.0M    star/sjdbList.out.tab
25G total
chapmanb commented 10 years ago

Jason; The best approach is to stop them, do the upgrade, than restart. Just upgrading should work too but depending on when the upgrade happens it could cause some jobs to fail. It's always safe to just qdel the controllers and engines and do the necessary upgrades. Fingers crossed all will work cleanly.

caddymob commented 10 years ago

Just to update, still not taking the account right for me... The parameter is still seen, just not getting written to the PBS scripts...

2014-02-06 18:45:09.995 [IPClusterStart] {'BcbioTORQUEEngineSetLauncher': {'mem': '40.2', 'cores': 16, 'tag': 'MORE_gatk', 'resources': 'account=YourAcctNotMine'}, 'IPClusterEngines': {'early_shutdown': 240}, 'Application': {'log_level': 10}, 'ProfileDir': {'location': u'/scratch/jcorneveaux/LETS_LOH/BAMS/MORE/gatk_std/log/ipython'}, 'BaseParallelApplication': {'log_to_file': True, 'cluster_id': u'd7548197-8b29-4184-9c93-9da67b4e3139'}, 'TORQUELauncher': {'queue': 'batch'}, 'BcbioTORQUEControllerLauncher': {'mem': '40.2', 'tag': 'MORE_gatk', 'resources': 'account=YourAcctNotMine'}, 'IPClusterStart': {'delay': 10, 'controller_launcher_class': u'cluster_helper.cluster.BcbioTORQUEControllerLauncher', 'daemonize': True, 'engine_launcher_class': u'cluster_helper.cluster.BcbioTORQUEEngineSetLauncher', 'n': 4}}

and the PBS -

#!/bin/sh
#PBS -q batch
#PBS -V
#PBS -j oe
#PBS -N bcbio-MORE_gatk-ipengine
#PBS -t 1-4
#PBS -l nodes=1:ppn=16
#PBS -l mem=41164mb
#PBS -l walltime=239:00:00
cd $PBS_O_WORKDIR
/packages/bcbio/0.7.4/anaconda/bin/python -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 50000); resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 50000); resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from IPython.parallel.apps.ipengineapp import launch_new_instance; launch_new_instance()' --timeout=60 --IPEngineApp.wait_for_url_file=960 --EngineFactory.max_heartbeat_misses=100 --profile-dir="/scratch/jcorneveaux/LETS_ROK/BAMS/MORE/gatk_std/log/ipython" --cluster-id="d7548197-8b29-4184-9c93-9da67b4e3139"

bcbio-nextgen,0.7.7a-e91123c and ...

ls -ldh /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/*cluster*
drwxr-xr-x. 3 jpeden galaxy  106 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/cluster_helper
drwxr-xr-x. 2 jpeden galaxy 4.0K 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/ipython_cluster_helper-0.2.8-py2.7.egg-info
-rw-r--r--. 1 jpeden galaxy  331 20140124.13.09 /packages/bcbio/0.7.4/anaconda/lib/python2.7/site-packages/ipython_cluster_helper-0.2.8-py2.7-nspkg.pth
chapmanb commented 10 years ago

Jason; You'll need ipython-cluster-helper 0.2.9 for this to work. Did you try:

/packages/bcbio/0.7.4/anaconda/bin/pip install --upgrade ipython-cluster-helper
caddymob commented 10 years ago

My apologies Brad, I had my version numbers mixed and thought we got what we needed with the e91123c upgrade. Will ping mi amigo with write perms and start charging YourAcctNotMine ;)

caddymob commented 10 years ago

We did it!! Thanks again and apologies for my mixup.

chapmanb commented 10 years ago

Nice one. Thanks for testing and confirming this. Glad it's rolling now.