cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

Timeout handlers do not execute when the corresponding `abort on X timeout = True` #5997

Open MetRonnie opened 9 months ago

MetRonnie commented 9 months ago
[scheduler]
    [[events]]
        abort on workflow timeout = True
        workflow timeout = PT1S
        abort handlers = echo "ALPHA"
        workflow timeout handlers = echo "TANGO"

The workflow timeout handler is not running when abort on workflow timeout is set

Originally posted by @MetRonnie in https://github.com/cylc/cylc-flow/issues/5959#issuecomment-1957288745

oliver-sanders commented 9 months ago

The reason for this is that when the workflow aborts, it terminates processes in the subprocpool INCLUDING the event handler.

Changing this behaviour will require careful thought as it could trigger events we don't want. E.G. preparing tasks may go into the submit-failed state erroneously.

hjoliver commented 9 months ago

Can we just leverage the cylc stop --now (but not --now --now) code, for the abort shutdown?

stop  -n, --now             Shut down without waiting for active tasks to
                        complete. If this option is specified once, wait for
                        task event handler, job poll/kill to complete. If this
                        option is specified more than once, tell the workflow
                        to terminate immediately.
oliver-sanders commented 4 months ago

Abort events take down the scheduler by raising a SchedulerError rather than requesting a shutdown. It's a much more instantaneous stop which also results in a non-zero exit code:

https://github.com/cylc/cylc-flow/blob/9d985f2306c7475073d3960ff3b998d23c1885df/cylc/flow/scheduler.py#L1662-L1663

In the abort case, we want to wait for aborted/timeout handlers, but I guess we might not want to wait for log file retrieval, etc (it could be a really critical shutdown).