Handling future final-incomplete tasks

hjoliver commented 2 months ago

This creates three final-incomplete tasks (i.e.,with final status and incomplete outputs) ahead of the flow:

[scheduler]
    allow implicit tasks = True
[scheduling]
    [[graph]]
        R1 = "foo => bar =>  f1 & f2 & f3"
[runtime]
    [[foo]]
        script = """
            cylc trigger //1/f1  # with --wait if you like
            cylc set -o failed //1/f2
            cylc set -o expired //1/f3
        """
    [[f1]]
        script = false

On running this,

f1 will be final-incomplete and retained in the task pool
f1 and f2 will be recorded as final-incomplete in the DB
- and will be spawned as such into the task pool later on, when the flow arrives

Final-incomplete tasks are (supposed to be) retained in n=0 as a safety net, for visibility and to eventually stall the workflow unless or until manually completed or removed. However, the stall itself just means the user has not dealt with the problem by the time the scheduler has run out of other things to do. Once the problem is apparent, the user should be able to respond appropriately in the moment, to prevent the future stall.

To "fix" a final-incomplete task we can (+):

(a) manually retrigger it, to run to completion
- (the user fixed the bug and ran it again )
(b) or manually set it to completion
- (the user fixed the problem on disk and/or told the scheduler to proceed as if the task completed)
(c) or manually remove it
- (the user confirms it is OK to continue without completing the task, accepting the consequences)
- (+) this doesn't currently works unless the flow has already blocked at the final-incomplete task

problems

final-incomplete tasks created by manual output setting are "hidden" in the DB until the flow arrives at some future time
- a serious problem already exists, but it is not very visible, which is not conducive to fixing the problem in the moment
removal does not have the desired result on these tasks, because they have not yet blocked the flow
- when the flow arrives it will be as if the manual set had never been done, and the task will run again
- so "fix" option (c) does not work as intended for final-status tasks ahead of the flow

solutions

final-incomplete tasks created by manual output setting should be spawned into n=0, for visibility
- (I think it was our intention that all final-incomplete tasks would be held in n=0)
- (I haven't thought of any downside to doing this)
cylc remove needs an option to "remember" the removal rather than erasing the history
- so that removing an n=0 task ahead of time has the same affect as doing it after the flow arrives
- (aside: we also need the same thing internally, e.g. for removal by suicide trigger, to avoid respawning a removed task)

hjoliver commented 2 months ago

I'm reasonably sure that this is just a small code change, and it is a NIWA priority - in particular to alleviate the pain of our manual expire use cases - which currently entails an unnecessary wait for the future stall to happen - however, as shown above this is more general than that use case. Summary:

it is possible to create final-incomplete tasks ahead of the flow
we need to be able to handle them in the moment, to achieve the same result without having to wait for the future stall to happen
(and doing so does not compromise our output completion safety net!)

oliver-sanders commented 2 months ago

final-incomplete tasks created by manual output setting should be spawned into n=0, for visibility

(I think it was our intention that all final-incomplete tasks would be held in n=0) (I haven't thought of any downside to doing this)

This is correct.

cylc / cylc-flow

Handling future final-incomplete tasks #6383

problems

solutions