cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

Other (non-bash) kinds of job scripts? #3613

Open hjoliver opened 4 years ago

hjoliver commented 4 years ago

Long considered, perhaps for Cyc 9.

See recent case by @oliver-sanders here: https://github.com/cylc/cylc-flow/pull/3574#issuecomment-627820716

oliver-sanders commented 4 years ago

(requires job script re-write in Python)

TomekTrzeciak commented 4 years ago

From the linked comment:

Jobs could be light-weight Cylc schedulers (aka frames).

Yep, I find this idea quite appealing too: https://github.com/cylc/cylc-flow/issues/2749#issuecomment-417486258.

(requires job script re-write in Python)

Does it (if you would go with the "Cylc frame" solution)? You probably still need some simple shell script to submit the "Cylc frame" to the batch system, but all the other complexities (communication with the mother ship, signal trapping etc.) could be handled by the "frame" rather than the job script.

oliver-sanders commented 4 years ago

True, the job script doesn't have to be Python but there are other advantages. Probably the biggest is that we can have the "client" stay alive for the duration of the task which would reduce the cost and latency of messaging. Python is a nice choice for us because we have control over the installation (via setup.py/conda) which means version compatibility becomes a lot easier (we only need to support a narrow window).

(I had originally thought about a more command-line-ish API with a myriad of super light-weight job scripts for different languages and contexts, though have slowly come round to the idea that embedding languages is a little evil.)

TomekTrzeciak commented 4 years ago

True, the job script doesn't have to be Python but there are other advantages. Probably the biggest is that we can have the "client" stay alive for the duration of the task which would reduce the cost and latency of messaging.

What I was thinking of is job script like:

# SBATCH <job directives>
cylc sub-workflow <suite> [<task-id>|<sub-workflow-id>] ...

i.e., the "Cylc engine frame" would be the "client" that stays alive and monitors the execution of user task script. But perhaps I misunderstand what you mean by "frames" in this context.

(I had originally thought about a more command-line-ish API with a myriad of super light-weight job scripts for different languages and contexts, though have slowly come round to the idea that embedding languages is a little evil.)

I think this would be OK:

script = """
#/usr/bin/env perl
...
"""

You could also potentially leverage jupyter infrastructure (since it will be required by Cylc anyway) for kernel workers in other languages. This could open the way for fast execution without spawning subprocesses for every task. I know you don't need the kernel part of jupyter, but it could be made optional. But perhaps scheduling overheads would outweigh any potential benefits of going that route.

oliver-sanders commented 4 years ago

Skygazing time!

Might have to move this somewhere else to avoid polluting this issue too much. Would you like to join the Cylc Riot.im room?

i.e., the "Cylc engine frame"

I think we should be able to achieve a more formal system for sub-suites, probably using a minimal specialist spawner in Cylc9 land:

from sub_suite_a import sub_graph  # composition   (efficient, no spawning)
from sub_suite_b import sub_flow   # encapsulation (spawns a new workflow like a job)

Foo = cylc.Task(...)
Bar = cylc.Task(...)
Baz = cylc.Task(...)

with cylc.Graph('every hour') as Hourly:
    Foo >> Bar >> sub_graph >> Baz >> sub_flow

with cylc.Graph('once') as Startup:
    Install

with cylc.run as flow:
    Startup >> Hourly

Interested in what you think about composition, I would have though in most cases it would be preferable to compose a child workflow into the parent (with collapsing in the GUI).

For "encapsulation" (that's the wrong word but I can't think of the right one) we can leverage the ZMQ pub-sub update system we are currently developing for driving the UI to keep the parent workflow informed of state changes in the child.

In the encapsulation case I would expect the main use case would involve a stripped-down Cylc scheduler running on an HPC somewhere. BTW we have made the first baby-steps towards a "stripped down cylc kernel" with main-loop plugins, which, in the Cylc8 timeframe should allow us to turn on or off parts of the kernel (e.g. cycling, job submission, etc).

I think this would be OK:

Perhaps controversial but I think this would be better:

script = my_perl_file

I really hate embedding code in places where it's not meant to be, it's good for lazy examples and test suites but real-world use I'd rather have separation of concerns, testability, linting, vulnerability scanning, etc. But do try to convince me otherwise.

To me a "Frame" would be defined in an appropriate language file, so a bash frame might look like this:

foo () {}

bar () {}

baz () {}

schedule __graph__
    foo >> bar >> baz
__graph__

Where the frame is run within a single submission.

You could also potentially leverage jupyter infrastructure (since it will be required by Cylc anyway)

In the current design JupyterHub (sanz Jupyter itself) will only be installed on the cylc servers (to provide the UI), not on job hosts. The idea of utilising Jupyter like that has not crossed my mind, how would it work?

hjoliver commented 4 years ago

(Brilliant, can't wait to get Cylc 8 out the door so we can put some serious thought into this stuff :+1: )

TomekTrzeciak commented 4 years ago

(sorry I didn't find the time to reply earlier)

Might have to move this somewhere else to avoid polluting this issue too much. Would you like to join the Cylc Riot.im room?

Happy to, if let me know how to access it.

Interested in what you think about composition, I would have though in most cases it would be preferable to compose a child workflow into the parent (with collapsing in the GUI).

Composition problem can be quite tricky, especially if you consider (sub)workflows of different cycling frequencies (e.g., what is the semantics of Hourly >> Daily)

In IMPROVER we've taken a different approach - define a full (but parameterised) graph (using our own parametric framework in Python) and split it into subgraphs afterwards. Currently we have two levels of splitting: subgraphs that go into individual cylc tasks and then cylc graph is split again into scheduling sections (by taking intersection across cycling frequencies). That gives us a lot of flexibility, but also comes with some disadvantages - I can explain more if there's interest.

For "encapsulation" (that's the wrong word but I can't think of the right one) we can leverage the ZMQ pub-sub update system we are currently developing for driving the UI to keep the parent workflow informed of state changes in the child.

👍 to more pub-sub style inter-suite communication

BTW we have made the first baby-steps towards a "stripped down cylc kernel" with main-loop plugins, which, in the Cylc8 timeframe should allow us to turn on or off parts of the kernel (e.g. cycling, job submission, etc).

Less monolithic and more plug-able cylc internals are definitely desirable. I'm glad this is happening on the sidelines.

Perhaps controversial but I think this would be better:

script = my_perl_file

I really hate embedding code in places where it's not meant to be, it's good for lazy examples and test suites but real-world use I'd rather have separation of concerns, testability, linting, vulnerability scanning, etc. But do try to convince me otherwise.

I won't try to convince you otherwise, because I'm with you on that one. I think it might be worth to make that a recommendation. It's already tricky enough with nested ini syntax interwoven with jinja2 without adding more languages into the mix.

To me a "Frame" would be defined in an appropriate language file, so a bash frame might look like this:

foo () {}

bar () {}

baz () {}

schedule __graph__
    foo >> bar >> baz
__graph__

Where the frame is run within a single submission.

That almost feels like you want to have bindings to cylc scheduler for different languages.

You could also potentially leverage jupyter infrastructure (since it will be required by Cylc anyway)

In the current design JupyterHub (sanz Jupyter itself) will only be installed on the cylc servers (to provide the UI), not on job hosts. The idea of utilising Jupyter like that has not crossed my mind, how would it work?

Ha, I don't know to be honest. I just thought that jupyter already has kernel implementations for many languages, so it might be worth a look if that could be leveraged somehow without too much effort. The main advantage would be side stepping the overhead of shelling out scripts. But perhaps jupyter workers executing multiple script tasks in a global space is actually not such a good idea (possible interference due to side effects).