cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

Double-reload if command in job script #5865

Open hjoliver opened 12 months ago

hjoliver commented 12 months ago

Bug introduced by this change: https://github.com/cylc/cylc-flow/pull/5592

[scheduling]
    [[graph]]
        R1 = foo
[runtime]
    [[foo]]
        script = cylc reload $CYLC_WORKFLOW_ID

This results in two calls to the reload mutation, instead of one, and repeats the task status messages as well.

$ cylc log bug | grep received
2023-12-06T14:26:19+13:00 DEBUG - [1/foo submitted job:01 flows:1] (received)started
2023-12-06T14:26:20+13:00 DEBUG - [1/foo running job:01 flows:1] (received)succeeded
2023-12-06T14:26:20+13:00 DEBUG - [1/foo succeeded job:01 flows:1] (received)started
2023-12-06T14:26:20+13:00 DEBUG - [1/foo succeeded job:01 flows:1] (received)succeeded

$ cylc log bug | grep reload
2023-12-06T14:26:19+13:00 INFO - [command] reload_workflow
2023-12-06T14:26:20+13:00 INFO - [command] reload_workflow
2023-12-06T14:26:20+13:00 INFO - [1/foo running job:01 flows:1] reloaded task definition
2023-12-06T14:26:20+13:00 WARNING - [1/foo running job:01 flows:1] active with pre-reload settings
2023-12-06T14:26:20+13:00 INFO - Command actioned: reload_workflow()
2023-12-06T14:26:20+13:00 INFO - [1/foo running job:01 flows:1] reloaded task definition
2023-12-06T14:26:20+13:00 WARNING - [1/foo running job:01 flows:1] active with pre-reload settings
2023-12-06T14:26:20+13:00 INFO - Command actioned: reload_workflow()
hjoliver commented 12 months ago

This new block is the cause, but I haven't figured out exactly what's going on:

https://github.com/cylc/cylc-flow/blob/a965b918a57e89e9e79119a59cfce2686c84674e/cylc/flow/scheduler.py#L1089-L1112

hjoliver commented 12 months ago

Ah, some progress. It's because the same job gets submitted twice! (Not double-handling of queued commands during reload as I initially thought).

It only happens if the task state hasn't updated yet, after job submission. So not a very serious bug after all, but still a bug.

Workaround:

script = """
    cylc__job__wait_cylc_message_started
    cylc reload $CYLC_WORKFLOW_ID
"""
oliver-sanders commented 12 months ago

It only happens if the task state hasn't updated yet

hmm, fishy. What state was the task in at this point? preparing or submitted?

MetRonnie commented 10 months ago

I can't reproduce with your original example

INFO - [1/foo preparing job:01 flows:1] => submitted
INFO - [command] reload_workflow
INFO - PAUSING the workflow now: Reloading workflow
INFO - Reloading the workflow definition.
INFO - LOADING workflow parameters
INFO - + workflow UUID = d5194cff-9f4d-4400-975c-3b36de1023e3
INFO - + UTC mode = False
INFO - + cycle point time zone = Z
INFO - + paused = True
INFO - Reloading task definitions.
INFO - [1/foo submitted job:01 flows:1] reloaded task definition
WARNING - [1/foo submitted job:01 flows:1] active with pre-reload settings
INFO - LOADING job data
INFO - Reload completed.
INFO - RESUMING the workflow now
INFO - Command actioned: reload_workflow()
INFO - [1/foo submitted job:01 flows:1] => running
INFO - [1/foo running job:01 flows:1] => succeeded
INFO - Workflow shutting down - AUTOMATIC