cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
329 stars 93 forks source link

Single-user resource management #5762

Open hjoliver opened 11 months ago

hjoliver commented 11 months ago

For single-user installations with no batch system, it would be useful to have an admin-free way to limit activity across multiple workflows.

Cylc internal queues help, but they don't see other workflows.

In this sort of situation, you don't really need sophisticated resource management, a simple job queuing system would probably suffice, perhaps with limiting based on simple server load metrics.

Cylc has long supported the basic at scheduler, but only for at now instant job submission, which is now no better than our built-in background job management.

Ideas:

atd batch

The at scheduler has a batch command that only releases jobs if server load is below some limit. It would be trivial for Cylc to support this. However:

gnu Task Spooler

An old project that has been resurrected recently-ish. Might be worth considering.

other??

oliver-sanders commented 11 months ago

Conventional batch systems such as Slurm, PBS, though typically deployed at scale, could also be used to manage local instances.

The issue and complexity of setup is not necessarily as bad as you might think, here's a couple of containers I found on Docker Hub:

oliver-sanders commented 11 months ago

See also this issue: https://github.com/cylc/cylc-flow/issues/3800

hjoliver commented 11 months ago

If you use Cylc on a laptop or workstation (i.e., not even a simple cluster), then I think installing a full-blown batch system is a big ask. Even if it is "not necessarily as bad as you might think" (which doesn't sound great, to be honest 😁) figuring out how to use it in such a minimal way might not be easy.

oliver-sanders commented 11 months ago

I think #3800 is as good as we're able to get working out of the box. I.E. for this case, use Cylc's ability to gather host metrics to prevent the host from becoming overwhelmed.

After that, no matter what we do the user is going to have to install batch system and start a daemon.

(Note even at isn't necessarily installed, if it is the daemon isn't necessarily running, and even then it might still require additional configuration, e.g. MacOS deliberately disables the at service for security reasons.)

Installing and starting a docker slurm cluster might be as simple as docker up slurm!

Another popular choice would be Kubernetes which can be installed locally to develop cloud deployments.