cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
325 stars 90 forks source link

slurm --clusters support #2504

Open hjoliver opened 6 years ago

hjoliver commented 6 years ago

The latest release of slurm apparently supports a single unique job ID across a federated cluster. For the moment though, if you submit a job to slurm on host X with #SBATCH --clusters=Y to make it run on host Y, any subsequent job interaction via the resulting job ID has to be done on host Y or else with --clusters=Y on the command line (i.e. the job ID is not recognized on the original submission host).

This way of submitting remote jobs without ssh is fine with Cylc, if hosts X and Y see the same filesystem (i.e. the job looks local to Cylc, even though it technically isn't). But with slurm subsequent job poll or kill fails because the Job ID is not recognized locally, and Cylc does not know to use --clusters=Y on the squeue and scancel command lines.

We should either make this work in Cylc or else document the problem and recommend using remote mode for the moment.

hjoliver commented 6 years ago

It is easy to make a custom batch system handler for a hardwired specific cluster. E.g. for cluster foo, make slurm_cluster_foo.py:

from cylc.batch_sys_handlers.slurm import SLURMHandler

class SLURMClusterFooHandler(SLURMHandler):
    """SLURM job submission and manipulation for --clusters=foo."""
    KILL_CMD_TMPL = "scancel --clusters=foo '%(job_id)s'"
    POLL_CMD = "squeue -h --clusters=foo"
    SUBMIT_CMD_TMPL = "sbatch --clusters=foo '%(job)s'"  # --clusters optional here as it's a job directive

BATCH_SYS_HANDLER = SLURMClusterKupeHandler()

Obviously it would be better to have a single slurm handler that extracts the cluster name - if present - from job directives though. Different jobs could potentially use different clusters.

Without altering the core of Cylc, we could have the job submit, poll, and kill commands search for the cluster in the job script before executing, every time they are invoked. This seems a bit perverse though.

Another option would be to have the task proxy remember the cluster name, if provided, and pass it to the job submit, poll, and kill methods each time. At a cost of one new task proxy attribute that will only ever be used by slurm jobs. [actually, duh - no cost, all directives are already remembered]

@cylc/core - any strong opinions on this? Or other ideas?

arjclark commented 6 years ago

@hjoliver - Thinking about this the other way round, could we maybe provide a config option under [job] that would be inserted into slurm directives (ignored otherwise) and used in the commands accordingly? (similar to the execution time limit entry)

matthewrmshin commented 6 years ago

We are likely to make life a lot easier for this issue when we solve #2199. (We'll schedule the work to commence after #2468 is merged.)

hjoliver commented 6 years ago

Trouble is we need this already on the new HPC at NIWA (although we could make do with the nasty hard-wired kludge above)

hjoliver commented 6 years ago

Plans were made on the assumption that slurm worked like other batch systems in this respect.

matthewrmshin commented 6 years ago

@hjoliver Understood. I think it is best to use a custom batch system handler for now. I have really wanted to work on #2199 before the end of this year, but that hasn't happened.

hjoliver commented 4 years ago

From local HPC engineer:

Federation doesn’t look an appropriate way forward:

“A job is submitted to the local cluster (the cluster defined in the slurm.conf) and is then replicated across the clusters in the federation. Each cluster then independently attempts to the schedule the job based off of its own scheduling policies.“