Snakemake-Profiles / lsf

Snakemake profile for running jobs on an LSF cluster
MIT License
36 stars 22 forks source link

Allow configuration of per-rule cluster settings #7

Closed mbhall88 closed 4 years ago

mbhall88 commented 4 years ago

Issue

Profiles are a better general solution for cluster configuration. However, I still feel there is a need to allow for configuring special cluster settings for specific rules. For example, on my cluster, if I want to use GPUs for a job I need to add a collection of parameters to the bsub call that are not relevant to normal jobs. It would be great if there was a way of specifying this.

Given that the cluster configuration is now deprecated it seems like a better way of doing per-rule configuration is to have a profile-specific way as each cluster will likely have different ways of configuring the "same" thing - e.g. GPU usage.

Proposal

Allow for an LSF-specific config file that effectively has much the same functionality as the deprecated cluster configuration YAML file. To ensure that it is obvious that this file is LSF-specific it will be required to be named lsf.yaml. Then, within the job submission process of this profile, we will look for the presence of this file. If there are specific settings for the current rule being submitted, then those settings will be applied.

~To make this mirror LSF as much as possible, I will endeavour to make the configuration mirror the LSF bsub options as closely as possible, and of course, produce thorough documentation on usage.~

EDIT: I think the best way to allow for the full suite of bsub commands is by just required the user to provide strings for the commands, otherwise I would end up implementing an entire API for bsub (which I am not keen on). So it would look something like

__default__: 
  - "-P project"
  - "-q queue"

my_rule:
  - "-P gpu"
  - "-m gpu-host"

other_rule: "-q special-queue -gpu 'num=2'"

Which allows either a list or a single string. I prefer the list as it is "neater" but I will support both.

@johanneskoester do you have any problems with this or any points you would like to raise/discuss?

funnell commented 4 years ago

This makes me think that the cluster configuration YAML file shouldn't be deprecated. I've seen a number of cases where it's still very nice to have, and I think that using profiles supplements the use of the cluster config rather than replaces it.

mbhall88 commented 4 years ago

The problem I had with cluster configs though was if I wanted to define a cluster argument that wasn't relevant for any other jobs it was kind of impossible. i.e. if one just was for a GPU node and the others weren't.

I have edited the original comment to reflect my idea.

funnell commented 4 years ago

@mbhall88 I wonder if you could just have an "extra" entry in the cluster-config, and include "{cluster.extra}" in the cluster command. I suppose my feeling is that it's better to have cluster specific parameters kept out of the snakemake files and in a separate config file. You could then put everything in lsf.yaml but then why not put it all into cluster-config? Assuming the "cluster.extra" idea addresses your use case, that is!

mbhall88 commented 4 years ago
  1. Because cluster config is deprecated
  2. I want to stress that this type of specification is LSF-specific (which I think is part of the reason why cluster config was deprecated)

Who knows, maybe cluster config will be "un-deprecated" in the future, but I would rather plan ahead for now. It won't be too hard to move the codebase to your suggested method if it does.

funnell commented 4 years ago

Fair enough. Although I've been griping a bit, I do really appreciate you taking the time to develop this profile and I think it's quite nice!