Snakemake-Profiles / slurm

Cookiecutter for snakemake slurm profile
MIT License
126 stars 44 forks source link

How do I make mem_mb_per_cpu work? #112

Open hepcat72 opened 1 year ago

hepcat72 commented 1 year ago

I am just starting out trying to use snakemake to run on our slurm cluster. Based on this snakemake documentation, I was trying to set:

resources:
    mem_mb_per_cpu=4000

which should have been plenty, but I kept getting OUT_OF_MEMORY job states, but the amount of memory reported under MaxRSS seemed like only a handful of megabytes, which confused me (though maybe I'm not interpreting the sacct output correctly - I don't run on the cluster often).

I get the sense though based on trial and error that the profile isn't respecting the mem_mb_per_cpu setting. How do I incorporate that?

mbhall88 commented 1 year ago

Are you passing the --slurm option to snakemake in addition to using this profile? If so, the two are pretty much going to clash I think. I guess you could confirm this by not using this profile and seeing if you still have the same issue. If so, that's a snakemake issue and not this profile.

hepcat72 commented 1 year ago

Oh whoops. Maybe I am. I didn't think I was, but it's in my notes. Probably tried it once when debugging a separate issue and forgot to remove it from what I've been pasting on the command line. Thanks. I'll try running without it tomorrow. I already quit for the day today.

johnstonmj commented 1 year ago

I'm guessing that this is a name mismatch. If I look at slurm_submit.py it shows:

RESOURCE_MAPPING = {
    "time": ("time", "runtime", "walltime"),
    "mem": ("mem", "mem_mb", "ram", "memory"),
    "mem-per-cpu": ("mem-per-cpu", "mem_per_cpu", "mem_per_thread"),
    "nodes": ("nodes", "nnodes"),
    "partition": ("partition", "queue"),
}

There is no mem_mb_per_cpu

Could you try changing mem_mb_per_cpu to mem_per_cpu (or another supported term)?

hepcat72 commented 1 year ago

I noticed that as well, but I didn't know how to reconcile that with the snakemake documentation and I didn't know why this profile differed from snakemake's terms...

smkres

That said, I know very little about using snakemake profiles. It didn't make sense to me that the keys appeared to be slurm options (without the --) and the values were variations on what I assume are reserved terms of snakemake's resources directive. I would have guessed it would have been the other way around. They keys were snakemake resource names and the values would be lists of corresponding options used on different cluster systems.

So I'm glad you chimed in about that. I'm just about to get back to my workflow cluster testing.

hepcat72 commented 1 year ago

And actually, now that I'm logged back into the cluster and connected to my screen session, I see that I was not in fact supplying --slurm - and that actually was what was effectively causing the problem. I was using it previously at one point (according to my notes), when I was having trouble with mem_mb_per_cpu for one of the early rules, but at some point along the way, I'd stopped using --slurm (and was not regenerating that output that had mem_mb_per_cpu set.

When I started working with new data yesterday (without using --slurm), I started having that rule fail with the state OUT_OF_MEMORY, which confused me, because it was working before. And at all points along the way, I was also using --profile.

And now that I tried re-adding --slurm, it again started working without the OUT_OF_MEMORY error.

So... It seems to me like, to stay compatible with the possible usage of --slurm, I would want --profile to use keywords that are compatible with --slurm. That would suggest that I should add mem_mb_per_cpu to the mapping in the profile, like:

RESOURCE_MAPPING = {
    "time": ("time", "runtime", "walltime"),
    "mem": ("mem", "mem_mb", "ram", "memory"),
    "mem-per-cpu": ("mem-per-cpu", "mem_per_cpu", "mem_per_thread", "mem_mb_per_cpu"),
    "nodes": ("nodes", "nnodes"),
    "partition": ("partition", "queue"),
}

Does that make sense or have I got something wrong?

mbhall88 commented 1 year ago

I noticed that as well, but I didn't know how to reconcile that with the snakemake documentation and I didn't know why this profile differed from snakemake's terms...

This profile existed before the snakemake --slurm option. So the docs on the snakemake website don't necessarily align with the options in this profile. So for suuplying that option with this profile I would use @johnstonmj 's suggestion above

hepcat72 commented 1 year ago

So, to clarify his suggestion, does...

Could you try changing mem_mb_per_cpu to mem_per_cpu (or another supported term)?

mean change them in my rules' resources directive or in that RESOURCE_MAPPING variable? If I change them in the profile scripts, then I could still explore using --slurm. Can I set both in the rules?

johnstonmj commented 1 year ago

I was suggesting that you update the keyword to be mem_per_cpu within the resources directive of each of your rules. This requires you to update each rule, but avoids modifying the profile itself.

I believe that your suggested edits to RESOURCE_MAPPING would work, too. If mem_mb_per_cpu is specified by the Snakemake docs, maybe this would be a welcome update to Snakemake-Profiles/slurm. However, I would prefer to make this edit via a pull request to update the profile. If you edit this on your personal machine, then any future users would need to make this same modification, so it feels less reproducible / portable.

I don't personally use the --slurm option with Snakemake as I prefer to make all necessary modifications / setup options within a Slurm profile that can be reused, and then use --profile slurm with Snakemake.