Open oliver-sanders opened 3 weeks ago
It might be useful to see the whole log, not just lines associated with the affected task. Maybe there was some kind of manual intervention that affected runahead tasks? e.g. triggering a runahead task. I'm not sure that we're properly removing task attribute labels like "queued" and "runahead" every time when we (e.g.) force a runahead-limited task task to run.
[edit] crossed wires with another bug report
It might be useful to see the whole log, not just lines associated with the affected task.
The logs are rather long!
There were a couple of indiscriminate triggers, targetting all tasks in the affected cycle in the run up to the issue. I don't think this included any runahead tasks as the cycle was inside the runahead limit at this point.
There were also a couple of kill commands, but these were more targetted and did not affect the task in question.
I haven't managed to make head or tail of this one yet. One user's workflows have encountered this a few times, they are on leave at the moment, hopefully when they return we can get them to run these workflows in debug mode which might give us a better chance of debugging.
There seems to be some circumstance where succeeded tasks can go back to running (apparently in the same main-loop iteration).
This log is typical of the issue:
Every time we see the confusing
running(runahead) => running
transition. There are no obvious exacerbating circumstances.This seems to be intermittent, but we appear to have an example which yields this bug relatively regularly, though I've not been able to reproduce it yet.