cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

pool: update DB after removing a task #6409

Closed oliver-sanders closed 1 month ago

oliver-sanders commented 1 month ago

This fixes the "running(runahead) => running" bug that could cause tasks to get stuck in the running state indefinitely.

Performing a DB write every time a task completes is going to be a performance hit since this is performed per-task not per-main-loop-cycle (i.e. there is no batching efficiency gain). I'm not sure what we can do about this without changes to the task_pool data model. Any ideas?

Check List

hjoliver commented 1 month ago

Performing a DB write every time a task completes is going to be a performance hit since this is performed per-task not per-main-loop-cycle

Hopefully not too bad, because task completion, even of family members, tends to be staggered rather than all-at-once.

I'm not sure what we can do about this without changes to the task_pool data model. Any ideas?

Yeah, batching DB ops for efficiency can be problematic for "live" data (i.e., not just for the historical record), if there's any chance of certain events occurring between DB updates.

To avoid this kind of bug I guess we have to either:

hjoliver commented 1 month ago

Merging this as the fix is simple and necessary ... the questions are of a wider scope.

oliver-sanders commented 1 month ago

To avoid this kind of bug I guess we have to either:

A fancy solution would be to allow the unwritten data (i.e. the delta) to be queried allowing us to avoid hitting the DB when not necessary, however, this would not be a small job.