Closed hjoliver closed 7 months ago
On the other hand, if we made it more friendly, it might just be equivalent to the stall timeout.
Exactly! Either inactivity means inactivity or it becomes something else.
I see this as a sys-admin feature rather than a test thing. A workflow could hit its inactivity timeout if it has pending xtriggers or even submitted/running tasks. This is useful because if a workflow is sitting there for long periods of time with pending xtriggers or active tasks, something external to Cylc is going wrong. We set an extremely long P30D
timeout at our site to automatically mop up anything that's got itself into a strange state.
So, I'm happy with the status quo. We already have events to cover other situations.
Note, we are currently missing one event: https://github.com/cylc/cylc-flow/issues/4957
OK, I agree with that.
In that case, we just need to document exactly what inactivity means. And note what you've just described as the intended use case, with a long timeout.
Pretty sure it is not currently defined in the docs. And it's not obvious, for instance, that an unsatisfied xtrigger is "inactive" (it is being actively checked periodically), or that a running task should be considered "inactive" just because it hasn't returned a job status message in a while (it is actively executing)
The existing docs are here:
The inactivity timeout is not affected by the presence of unsatisfied xtriggers (including clock triggers), which might be surprising if you abort on inactivity timeout in a clock-triggered workflow.
It seems a bit aggressive to abort on "inactivity" if there are active tasks and/or active xtriggers present.
On the other hand, if we made it more friendly, it might just be equivalent to the stall timeout.
I think this feature was originally added primarily for use in functional tests, where we might assume something has gone wrong if nothing is happening, even if the workflow is not technically stalled.
At the very least, we should document for users exactly what "inactivity" means.