cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

Held tasks need multiple triggers to run? #6398

Open ColemanTom opened 1 month ago

ColemanTom commented 1 month ago

I am finding that if I have a held workflow (not paused), if I trigger a task it appears to do nothing. But, if I trigger it twice, it will then run the task. See the attached gif.

multiple_trigger_to_run

The scheduler log shows this

2024-10-08T00:42:22Z INFO - Command "force_trigger_tasks" received. ID=8b832ccb-2fba-46c7-a389-602fb984d60c
    force_trigger_tasks(flow=['all'], flow_wait=False, tasks=['20241006T1800Z/glu_ops_process_background_iasi'])
2024-10-08T00:42:22Z INFO - Command "force_trigger_tasks" actioned. ID=8b832ccb-2fba-46c7-a389-602fb984d60c
2024-10-08T00:42:22Z INFO - [20241006T1800Z/glu_ops_process_background_iasi:waiting(held)] => waiting(held)(queued)
2024-10-08T00:42:29Z INFO - Command "force_trigger_tasks" received. ID=883ccef9-4b47-4cac-9258-26056328462b
    force_trigger_tasks(flow=['all'], flow_wait=False, tasks=['20241006T1800Z/glu_ops_process_background_iasi'])
2024-10-08T00:42:29Z INFO - Command "force_trigger_tasks" actioned. ID=883ccef9-4b47-4cac-9258-26056328462b

If I read that correctly, the trigger pushes it to the queued state, and then a second trigger pushes to submit the job. I had expected that triggering would shift it twice.

I'm unsure if this is a Cylc-flow question or a UI question, so I've put it here as that is how I'm looking at it.

hjoliver commented 1 month ago

This is intended.

The held attribute should not make any difference here, because we can run held tasks (they should stay held to prevent any automatic retries unless released).

Are you happy with this explanation, @ColemanTom ?

ColemanTom commented 1 month ago

Ah, ok. So because this task is in a queue, it takes two triggers. I just tested on a task that does not belong to a queue and it does only take one. I think that there is no visible change in the WUI when I do the first trigger is part of the confusion. In Cylc7, when triggering a task which was in a full queue, it changed to (I think) a yellow colour so show the state had changed. Here, there was no update to the display.

hjoliver commented 1 month ago

In Cylc 8 "queued" isn't a task state, it's a particular attribute of the waiting state.

Task attributes are shown as badges on the task icon. I think we only display one badge at a time, and we prioritize held over queued, so that's why you didn't see a change in the GUI.

@cylc/core - maybe this suggests we need to stack multiple badges on task icons?

ColemanTom commented 1 month ago

It would be helpful to see some indication that an action has been registered and done something. I wasn't sure if it was severe lag or what was happening.

hjoliver commented 1 month ago

(We can close this after adding or tweaking a cylc-ui issue to cover the last comment)

oliver-sanders commented 1 month ago

[edit]: Dammit, this is wrong, see later comment

Aah right.

Under normal circumstances, triggering a held task DOES cause it to be run immediately, however, in this case the queue that the task was added to was full so it sat there waiting.

(We can close this after adding or tweaking a cylc-ui issue to cover the last comment)

If the queued state always beats the held state (which I think it does), then we can resolve this by changing the "task modifier" icon order from:

to:

Note, this applies both to the GUI and Tui.

Additionally, when you select a task in either, you should get a task state summary (e.g. waiting (held)). We should ensure that queued beats held here too.

And (although I agree that held isn't really a special case here) we should still give it a passing mention in the docs:

$ cylc trigger --help | grep held

TODO:

oliver-sanders commented 1 month ago

As an aside, see this snippet from the Cylc UI design document where we were exploring context information for task state modifiers:

Screenshot from 2024-10-11 09-36-47

And the [now?] erroneous statement underneath about held taking priority over queue (I guess that made sense at the time?).

Where the "view" button would have taken you to this https://github.com/cylc/cylc-ui/issues/462

oliver-sanders commented 1 month ago

@hjoliver

Sorry, on reflection, I have realised that this was not the intended behaviour and is actually inconsistent. The current behaviour is an unintended result of a bugfix (mia-culpa) for a completely different issue.

To explain, here's the stated intention:

  • Triggering an unqueued waiting task queues it, regardless of prerequisites.
  • Triggering a queued task submits it, regardless of queue limiting.
  • Triggering an active task has no effect (it already triggered).

cylc trigger --help

Under this logic, if you trigger a held task it will be queued. If there is space in the queue, it will run instantly.

However, at a later date, the queues because "is_held" aware in order to allow users to hold queued tasks (without us having to yank the tasks out of the queue in order to do so):

https://github.com/cylc/cylc-flow/blob/59ce660c2789eb72ded15aafde665ca5876cd2a6/cylc/flow/task_queues/independent.py#L57-L58

So a held task won't queue (naturally) and a queued held task won't run (naturally).

This changed the behaviour. Now when we trigger a held task, it gets added to the queue, but the queue will never release it (irrespective of whether there is space in the queue or not).

So the behaviour changed from "it will run when there is space in the queue" to "it will never run under any circumstances" which was not intended. Intentions aside, the "it will never run" bit is also totally illogical.

In my above comment, I made this suggestion:

If the queued state always beats the held state (which I think it does), then we can resolve this by changing the "task modifier" icon order from:

  • is_held
  • is_queued
  • is_runahead

to:

  • is_queued
  • is_held
  • is_runahead

However, that doesn't make sense. Under the current implementation:

So there is no order of precedence we can give the task modifier icon that will work in all cases, in other words, the logic is inconsistent.

To resolve this we need to either:

  1. Differentiate between a held task that became queued (i.e. was manually triggered) and a queued task that became held (i.e. was manually held).
  2. Remove tasks from the queue when they are manually held rather than leaving them in the queue and filtering out all held tasks.

Option (2) is nicer, and technically correct (if you hold a queued task, you are suppressing it from being submitted, so it doesn't make sense for the task to remain queued). I think it's relatively easy to do these days as we now have other functionality that needs to remove tasks from the queue (e.g. cylc remove).

hjoliver commented 1 month ago

I ran into a hold bug when testing this on Friday, which I'll now try to reproduce (from memory ... it was on my office box, and I'm WFH today). However, I thought it was a different issue, because @ColemanTom's description above says:

If I read that correctly, the trigger pushes it to the queued state, and then a second trigger pushes to submit the job. I had expected that triggering would shift it twice.

That's just the intended double-trigger behaviour when queue-limiting is active, which doesn't seem to gel with what you're saying @oliver-sanders ... maybe it depends on how Tom's tasks got held in the first place (e.g. were they already queued or not?).

hjoliver commented 1 month ago

(Side-issue spawned after taking hold and queue out of the mix: #6406 )

ColemanTom commented 1 month ago

Tom's tasks got held in the first place (e.g. were they already queued or not?).

I don't really remember the setup I had. If I pay attention and see it again I'll try to comment back here.

hjoliver commented 1 month ago

Talking a step back from attribute priority, to confirm triggering and queues work as intended without hold involved:

[scheduling]
    [[queues]]
        [[[default]]]
            limit = 1
    [[graph]]
        R1 = "foo:start => a => b"
[runtime]
    [[foo]]
        pre-script = "sleep 360"
    [[a, b]]
        script = "sleep 10; false"
        execution retry delays = PT5S
hjoliver commented 1 month ago

Now see how hold affects the active (n=0) waiting task a:

This kinda seems right from a manual triggering perspective, IF we think that manually triggering a held task should make it and remain held - which seems reasonable because we can hold a running task, which just means it won't automatically retry.

But maybe not from an automatic triggering perspective, because a held task cannot submit a job automatically, so arguably it should not be queued? 🤔

hjoliver commented 1 month ago

Now see how hold affects the future task b:

Alternatively:

(Then same as for the a case above)