cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
329 stars 93 forks source link

allow some real tasks in sim mode? #5975

Open hjoliver opened 7 months ago

hjoliver commented 7 months ago

We need an easy way to make designated tasks do real things in simulation mode (and/or dummy mode, if we keep it - #5961)

One of the main use cases for simulation mode (for me, at least) is to test-run other users' workflows that are (e.g.) stalling or stopping for reasons they don't understand.

These are typically big complicated workflows (otherwise, users can understand it more easily themselves), and they often (or not infrequently, at least) contain tasks that use the CLI to affect the flow on the fly - e.g., by expiring or killing other tasks.

The content of these tasks affects the scheduling, so a wholesale sim mode isn't very useful. I have take the annoyingly labour-intensive route of manually dummying out all the other tasks whilst leaving those ones alone.

Similarly, we can't simulate optional branching, which is one of things users are most likely to get in a mess with.

proposal

Add a script item to the existing task [simulation] config.

In simulation mode, simulate all the other tasks, but any with simulation scripting should run as real local jobs.

This would make it easy to test all aspects of scheduling in simulation (the sim scripting can use the Cylc CLI, send arbitrary output messages...)

oliver-sanders commented 7 months ago

So, two components here:

  1. Simulating automated interventions.
  2. Simulating task outputs

Point (2) is covered by the skip-mode proposal which allows the outputs the task will generate to be defined in the config. I expect we will extend this logic to simulation mode in due course.

For point (1), automated interventions are a pattern we actively discourage, so this isn't something we would be interested in at our end. Worth noting that to some extent this can be achieved the same way site-portability is handled in a lot of workflows, e.g:

#!jinja2

[scheduling]
    [[graph]]
        ...

{% if run_mode == 'simulation' %}
{% include "runtime-simulation.cylc %}
{% else %}
{% include 'runtime.cylc' %}
{% endif %}
# runtime-simulation.cylc

[scheduler]
    allow implicit tasks = True

[runtime]
    [[task1]]
        script = ...
hjoliver commented 7 months ago
  1. Simulating task outputs

Point (2) is covered by the skip-mode proposal which allows the outputs the task will generate to be defined in the config. I expect we will extend this logic to simulation mode in due course.

That's fine, but "due course" might be a while off. My proposal is very easy, and would allow it out of the box.

  1. Simulating automated interventions.

Just to be clear, I want to allow use of real automated interventions in simulation mode, so that simulation mode can easily simulate real workflows that use them.

I encountered this exact problem in several large workflows already this year.

For point (1), automated interventions are a pattern we actively discourage, so this isn't something we would be interested in at our end.

That's all very well - I don't encourage it either - but that ain't' gonna stop some users using it, and asking for help when their workflows run into trouble. Also I suspect there are at least some legit use cases for this sort of run-time graph surgery. The best we can do is advise against it, where possible.

oliver-sanders commented 7 months ago

That's fine, but "due course" might be a while off.

We are already developing this mechanism, it will be faster to add it to simulation mode than to implement a new run mode, especially as this would conflict with ongoing work (https://github.com/cylc/cylc-flow/pull/5721).

hjoliver commented 7 months ago

I'm not suggesting a new run mode (misleading title changed!). Just that in simulation mode if there is defined sim-mode scripting, run it as a real job - so that it is easy to simulate workflows that (rightly or wrongly!) do use "automatic interventions".

hjoliver commented 7 months ago

Legit use cases for automated interventions?

A task job determines that (for whatever reason) we no longer need to continue with a sub-graph that was triggered earlier, so it uses the Cylc CLI to terminate and clean up that sub-graph.

oliver-sanders commented 7 months ago

I'm not suggesting a new run mode (misleading title changed!). Just that in simulation mode if there is defined sim-mode scripting, run it as a real job

Because simulation mode uses a different code pathway for the submission process, I don't think that this can be achieved. What you're suggesting is more aligned to dummy mode, which hacks the task's script, etc at the config level, then submits a real job.

hjoliver commented 7 months ago

What you're suggesting is more aligned to dummy mode, which hacks the task's script, etc at the config level, then submits a real job.

Yes I agree with that, but unfortunately I thought of this just after we semi-agreed to remove dummy mode 🤣