Snakemake-Profiles / lsf

Snakemake profile for running jobs on an LSF cluster
MIT License
36 stars 22 forks source link

Make queue configurable per rule and attempt #60

Closed leoisl closed 4 months ago

leoisl commented 1 year ago

Sometimes we want to try running a job with 100 GB on 1st try, 300 GB on 2nd try and 1 TB of RAM on 3rd try. The third attempt might be needed to be submitted to a big mem queue, while the first two could be to the standard queue. The user should be able to specify which queue the profile should use at each attempt

mbhall88 commented 1 year ago

One option would be to copy what we do in the slurm profile https://github.com/Snakemake-Profiles/slurm#rule-specific-resource-configuration

mike2vandy commented 4 months ago

My question seems relevant here, I'm hoping I'm not asking something that's already answered. I just want to configure the lsf profile to send a job out to one of three queues, how can I do that?

dlaehnemann commented 4 months ago

I have just gotten this entry in my yaml profile to work, but with the latest snakemake (8.11.3) and the latest snakemake-executor-plugin-lsf (0.2.4):

set-resources:
  test_rule:
    lsf_queue: '( "medium" if True else "long" )'

So my general recommendation would be to switch to snakemake 8 and use the latest versions of everything. And then this should translate to the situation of the original issue with something like:

set-resources:
  big_memory_rule:
    lsf_queue: '( "regular" if attempt <= 2 else "big_mem" )'

But I'm not sure I understand what you mean by "send a job out to one of three queues", @mike2vandy ? Should this be randomly assigned to one of the queues? I guess you could then have a statement that randomly chooses one of three queues---but I'm not sure where this would be useful, so I'm probably misunderstanding this one...

I'll close for now, in the hope that this nevertheless solves the problem for all the posters here. But feel free to reopen (or to file another issue).

dlaehnemann commented 4 months ago

P.S.: Just to also document working syntax for specifying this directly within a rule, here's a working example:

rule test_rule:
    [...]
    resources:
          lsf_queue="medium" if True else "long"

So, less quoting and parentheses required in this place, to end up correctly in the eventual submission.

And one not: even though specifying something so cluster-system specific like lsf_queue doesn't really make sense in a generic rule that should be runnable anywhere, you might have a need for similar dynamic expressions for other types of more generic resources. It's just what I could quickly test just now.

Edit: Syntax error lsf_queue: "medium" [...] corrected to lsf_queue="medium" [...].

mike2vandy commented 4 months ago

So my institution's lsf is configured to send jobs to a queue based on requested resources, they advise against specifying a single/default queue (some queues are better configured for 1 processor jobs, others for > 1 processor, etc.). However, my workflow includes singularity containers and some nodes/queues aren't configured for singularity. I don't know if it's achievable, but I'd like to limit which queues my jobs can be sent to (the ones I know will work). And currently my pipeline is written in snakemake 7. Sorry, I should have mentioned that. There are like 100 rules in this workflow, I'd like to avoid specifying which rules get sent to which queues, so just asking the experts if that's even possible.

dlaehnemann commented 4 months ago

I think that optimally, your institution's lsf system should take care of sending jobs that require singularity to queues that have that capability. They should be able to configure some kind of resource called singularity, where you can request that resource in a bsub command and then your job will only be sent to a queue that satisfies that criterion. Then, you should be able to annotate rules that use a singularity container to ask for that resource.

But I'm also still not sure, whether this will work with the profile here and snakemake <= 7. What's holding you back from migrating to snakemake 8 and using the executor plugin? The plugin seems to work nicely for us, and snakemake 8 fixes some bugs for dynamic resource specifications, so that the above examples work.

mike2vandy commented 4 months ago

It's not my workflow, but I know a big reason it hasn't been upgrade is related to S3RemoteProvider() and reworking all of the S3.remote() lines to however snakemake 8 handles S3 storage/retrieval. Thanks for thinking about all of this.

dlaehnemann commented 4 months ago

Yeah, these kinds of things always end up being some kind of corner case, and it is always tricky to get your head around so many moving parts. And while the new pluging system solves quite a few things nicely, my experience is also that it sometimes makes it harder to track down where something is actually defined / where the code that does some particular thing lives (snakemake, the interface, or the plugin). So I can definitely see the mental hurdle of migrating, especially if there's also a storage plugin involved.

But, with the caveat that I have not worked with remote storages and the respective plugins, maybe you can point the workflow developers to this: https://snakemake.github.io/snakemake-plugin-catalog/plugins/storage/s3.html

And probably the migration guide, as well: https://snakemake.readthedocs.io/en/latest/getting_started/migration.html