cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

remote: zero installation remote client #6003

Open oliver-sanders opened 9 months ago

oliver-sanders commented 9 months ago

One of the challenges facing containerised deployment of Cylc is the requirement for Cylc to be installed in the job environment.

This problem (and the Cylc networking requirements around it) have proven to be a pain point in cloud deployments, especially for those not familiar with Cylc.

This is also a bit awkward as it means you can't just use off-the-shelf containers to run your jobs in as you must install Cylc into the container. This is a lot of extra work, e.g. install mamba, create environment, install environment, install wrapper script, etc. This also bloats the container size which isn't great. Ideally the Cylc comms mechanics would be separate from job environments so the workflow writers can focus on the execution environments and the sys admins can focus on the Cylc infrastructure.

Two possible solutions to address this problem:

  1. Develop a light weight remote Cylc client that is easier to install.
    • E.G. a statically linked ZMQ binary with a thin Cylc layer on top (for locating the certificates, etc).
    • A single-file zero-dependency binary would be a lot easier (and lighter) to layer onto existing Docker files.
    • Would alternate implementations of a subset of the Cylc CLI (e.g. cylc remote-init, cylc message and cylc broadcast)
  2. Listen for status changes via the filesystem.
    • This assumes the job platform shares a filesystem with the scheduler.
    • Rather than using the cylc client to write messages to the job.status file, have the job script write to the file directly.
    • Expose the path of the job.status file via an environment variable so the job can write custom messages to it.
    • Poll the results back. Note, polling on a shared file system has lower overheads (no ssh) so could be performed more regularly.
    • It would be possible to implement a different kind of poller using Cylc 8's main-loop plugins which would be a long-lived process which would be more efficient than running cylc jobs-poll commands.
    • This approach could take advantage of filesystem events for push based task comms over the filesystem (even NFS supports some filesystem events).
    • This poller would queue newly detected task messages via the GraphQL interface (over ZMQ).

Option 1 is more sophisticated as it would open up access to the full GraphQL API, however, still requires Cylc to be installed on the container.

Option 2, especially with a long-lived scheduler-side poller process is starting to look like an attractive solution to me. Essentially just an extra process which maintains a list of the job.status file paths of active tasks and either registers filesystem events (for push notifications) or simply polls them (for pull notifications) on much shorter timeframes than generally done with conventional Cylc task polling. The poller process would queue messages via the GraphQL interface (over ZMQ) when new messages are detected so would be completely asynchronous to the scheduler's main loop (no subprocpool burden).

oliver-sanders commented 9 months ago

Looked into named pipes as a training exercise and gave this a go. It worked fine, but named pipe don't work across hosts so unfortunately that was a dead end.

Supporting file system events is probably much easier than finding a file system + kernel combination that supports this pattern so that's probably a dead end too.

However, I swapped out the named pipes for a simple file poller that calls "readline" to check for new lines. This seems to work pretty well and is still substantially more efficient than Cylc's existing poller implementation. Implementation was surprisingly easy:

https://github.com/oliver-sanders/cylc-flow/pull/new/local-job-poller

Note, this doesn't replace the existing task polling logic which can continue to run alongside. It replaces push messaging (i.e. zmq or ssh+zmq).

It's currently running readline on each status file every second (which is ~ every main loop iteration). Will need to test in anger, but I suspect that this will put fairly minimal load on the filesystem (it's only trying to read one line not the whole file). The rate can be lowered a bit, and the pollers could be pushed into their own process if performance is a concern.

TODO: