proposal: set/reset/remove/skip

oliver-sanders commented 1 year ago

Intro

At present a few Cylc 7 use cases are not currently possible at Cylc 8 or are technically possible, but difficult to action, or have little to no feedback via the user interfaces when actioned.

The remaining issues largely result from removal of cylc reset which was used for a much wider range of cases than intended, and the ease of selecting tasks in the needlessly bloated SoS task pool.

For this PR we have attempted to flush out the remaining use cases not adequately catered for at present and answer them with four proposals. Because these proposals interact it is necessary to consider them in one go. Note that some use cases have shuffled around between different functionalities so please go through all four proposals before trying to thrash out the details of any one proposal in particular.

The best place to start is "docs/proposal-interventions.md" which contains a list of use cases considered, the Cylc 7 and the proposed interventions.

Preference

This proposal modifies the cylc set proposal and supersedes the task expire proposal, the main changes being:

Setting a task output should change the task status.
Task expiry should remain implemented as a task output.

It's the author's opinion that changing these implementation details leads to clearer behaviours for users. For completeness the rationale for this is included below but is not a part of the proposal so doesn't need to be reviewed as a part of it.

The Case For `set` Changing The Task State

Logical Consistency With Task/Job Model:
- The job status represents the progress/result of a single submission i.e. what actually happened.
- The task status represents the abstract state as seen by the scheduler which can be used in
  triggers.
Tasks represent the sum of all outputs yielded by jobs (e.g. if job #01 yields output x, job #02 yields output y, then the task has both outputs x and y). It is not true that the task reflects only the most recent job, it makes perfect sense for manual interventions to contribute to the abstract task status.
Visibility:

Since task-job separation has now been implemented in cylc review, gui and tui, the Cylc 7 problem where "resetting" the task status manually obliterated the larger context is no longer a problem. If you see a succeeded task with a failed job, it is clear than an intervention has been performed.

If a user sets a task as succeeded, they are going to expect it to show in the GUI as succeeded. This succeeded output is what the scheduler considers for triggering purposes so is the most important piece of information to feedback via the user interfaces.

Whereas only changing the prerequisites of downstream tasks DOES obliterate the context because this action has no visible impacts in Review/GUI/Tui leaving user's confused as to whether their intervention had any effect, and what the impact of that intervention might be.

The Case For Expire To Be A Task State

The "expired" task state records the expiry event/intervention in a way which is intuitive to users and visible with existing tools.
Review, GUI and Tui already work with task states (incl expired) and provide tools for filtering tasks by state.
Expired tasks are not instantly removed so remain in the n-window and can be manually triggered if necessary. It would be possible, but illogical to keep expired tasks in the n-window if expiry were orthogonal.
The "expired" state can be used in inter-workflow triggers which may be necessary where downstream workflows depend on a real-time workflow with catch-up logic.
:expire triggers are used in the graph making them work the same as task outputs. Implementing them differently to task outputs but representing them the same is illogical. Xtriggers might be considered a workaround, however, they cannot be used with optional outputs or conditional triggers so would not be able to replace :expire triggers for all cases meaning that task expiry will continue to be represented as a task state in the graph.
Other than implementation "correctness", there doesn't seem to be any practical advantage to users in changing the implementation of expiry from task state to attribute from a behaviour perspective.
At the moment expiry fits into the task state model in a way which is fairly intuitive. Changing this means that expiry will be bespoke edge-case which user's will have to learn. The ideal solution would be to unify expiry with the existing model rather than creating a new one.
Expiry, success and failure are fundamentally orthogonal outcomes. Implementing expiry via a different model causes this relationship to break down. A consequence of this is that the graph branching which results from expiry becomes implicit rather than explicit. This has safety implications.

Background: The "xyz" Problem

The "xyz" problem is a use case which is awkward under the optional outputs model which reveals the larger issue with task expiry under the current model.

The Problem

Under the present optional outputs logic, the problem with this graph:

a:x? => x
a:y? => y
a:z? => z
x | y | z => b

Is that currently "zero or more optional branches will be run", so it is possible that neither x or y or z will run. Whereas the intention is that one branch will run.

The problem is the implicit dependency:

a:failed? => {none}

This is chain breaking behaviour which the user never defined. The documented workaround for these problems is to use a pseudo dependency like so:

a => b  # pseudo

This graph is now safe as the :failed output will trigger a stall, albeit on the wrong task.

Relevance Of "xyz" To Task Expiry

Both the "xyz" problem and task expiry problem suffer from the same issue at heart which is that Cylc does not have enough information to determine what the intended pathway is.

E.G. with this graph:

a:expired? => x
a:succeeded? => y

The problem is again the implicit dependency:

a:failed? => {none}

This implicit chain breaking is dangerous, the user hasn't defined what their intention was for this case so Cylc doesn't really have enough information to safely proceed here.

A solution to the "xyz" problem would also solve the expiry problem and make optional outputs more flexible and explicit. This PR assumes that, making chains is more common than breaking them and aims to solve the general problem rather than push expiry into a bespoke model to address a wider range of uses. This would enhance the power of optional output driven graph branching, which would be much more complete with the expired state than without it.

oliver-sanders commented 1 year ago

@hjoliver @dpmatthews, 4c5bf7f50125472a1c022ad9108ac7fd2a7ec8bf amends the faulted example, considers expiry for the hidden pool and defines the interaction between expiry and trigger.

hjoliver commented 1 year ago

I will merge this PR now. This is urgent work, and given the length of these proposals and involvement so far in these topics I don't think we need to wait on other reviewers.

cylc / cylc-admin