Open mr-eyes opened 2 years ago
thanks Mo! It would be really great to provide an example snakemake rule where you set time
, partition
, etc within the snakemake rule, so folks can see how that happens :)
I have some examples over here if you want to just swipe: http://bluegenes.github.io/hpc-snakemake-tips/ e.g. -
rule quality_trim:
input:
reads="rnaseq/raw_data/{sample}.fq.gz",
adapters="TruSeq2-SE.fa",
output: "rnaseq/quality/{sample}.qc.fq.gz"
threads: 1
resources:
mem_mb=1000,
runtime=10
shell:
"""
trimmomatic SE {input.reads} {output} \
ILLUMINACLIP:{input.adapters}:2:0:15 \
LEADING:2 TRAILING:2 SLIDINGWINDOW:4:2 MINLEN:25
"""
One more thought -- I see the default jobs
is 100 and default partition
is med2
-- can we change these to follow our recommended queue usage?
options: default low2
to keep default jobs at 100, or default jobs <= 30 on med2
.
alternatively (or in addition), you can add resources: [cpus=30, mem_mb=350000]
to limit cpu and memory allocation. The one caveat is that we don't need these limits for low2
or bml
, so they may be annoying to have in the cluster profile when running on those queues.
A little trick that worked for me is using cpus_med2
and cpus_bmm
to separate resource use on different partitions. Then I only set resource limit for med2
and bmm
partition using resources: [cpus_med2=30, cpus_bmm=30]
. This way snakemake will limit resources usage on medium priority partitions but won't restrict low partition usage.
Of course you will have to set cpus_med2
or cpus_low2
in your resource keyword for each rule instead of default parameter cpus
.
As a bonus, you can use this function to automate which partition snakemake should submit your job to:
def getPartition(wildcards, resources):
# Determine partition for each rule based on resources requested
for key in resources.keys():
if 'bmm' in key and int(resources['cpus_bmm']) > 0:
return 'bmm'
elif 'med' in key and int(resources['cpus_med']) > 0:
return 'med2'
if int(resources['mem_mb']) / int(resources['cpus']) > 4000:
return 'bml'
else:
return 'low2'
And then in rule definition:
...
params: partition=getPartition
...
In my profile, I set following default resources:
default-resources: [cpus_bmm=0, cpus_med2=0, cpus=1, mem_mb_bmm=0, mem_mb_med2=0,, mem_mb=2000, time_min=120, node=1, task=1, download=0]
One more thought -- I see the default
jobs
is 100 and defaultpartition
ismed2
-- can we change these to follow our recommended queue usage?options: default
low2
to keep default jobs at 100, or default jobs <= 30 onmed2
. alternatively (or in addition), you can addresources: [cpus=30, mem_mb=350000]
to limit cpu and memory allocation. The one caveat is that we don't need these limits forlow2
orbml
, so they may be annoying to have in the cluster profile when running on those queues.
Thanks, @bluegenes for the suggestions. I have edited the default parameters for partition
. I don't think setting the default mem_mb
to 350GB is a good idea because that will consume a lot of memory for the total running job on default parameters. Same with the cpu
. What do you think?
A little trick that worked for me is using
cpus_med2
andcpus_bmm
to separate resource use on different partitions.
That's a cool workaround, thanks for sharing! I think controlling the default parameters for each partition separately can also work using Python functions with the partition name as input.
I don't think setting the default mem_mb to 350GB is a good idea because that will consume a lot of memory for the total running job on default parameters. Same with the cpu. What do you think?
As I've used it,resources
at the top level doesn't actually allocate that memory (or cpu/etc), it just limits the total amount you can allocate at once. The resources
within each rule does try to allocate that particular amount of memory/etc, as does default-resources
which is used to fill in resources
for rules missing any of the default resource parameters.
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
That's a cool workaround, thanks for sharing! I think controlling the default parameters for each partition separately can also work using Python functions with the partition name as input.
This sounds like an excellent workaround. If we can set limits for med, high partitions by default and no limits for low, that would be really helpful. Of course for rare cases (deadlines, huge jobs, etc), users can override the limits by setting different ones on the command line with, e.g. --resources mem_mb=XX
.
this is all greek to me. Maybe we need (or could use) a lab meeting tutorial/demo on cool farm/snakemake hacks...
this is all greek to me. Maybe we need (or could use) a lab meeting tutorial/demo on cool farm/snakemake hacks...
😂 I ran an ILLO on farm/snakemake (w/profiles and resource limitation hacks!) back in Aug 2020, but we could do another/up-to-date one? @mr-eyes, interested in doing this with me? Partition-specific allocation using this profile is already making my life better! @SichongP, I would also love your feedback on what we come up with if you have time, in case you have more/different tricks you use.
Back when profiles were newer, the hard part was figuring out how to introduce them without leaving folks behind who are newer to snakemake. But now I think profile setup is something we should just help everyone do as soon as possible, since it makes so many things easier (and doesn't add much complication, aside from setup).
ILLO from 8/24/2020 - http://bluegenes.github.io/hpc-snakemake-tips/ My practices have changed a little since then, but not a ton. I think for the next one, I would start with profiles and assume snakemake conda environment management :)
@mr-eyes, interested in doing this with me?
Sure!
Resolves #32