Closed dpmatthews closed 2 years ago
Possibly caused by tasks drifting out of the n=0/1 window then drifting back in again at a later time?
These tasks progress as normal, then the one weird log message appears some time later:
cylc cat-log dm | grep model_varbc_reset
2022-05-13T12:54:12+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:00 flows:1] spawned
2022-05-13T12:54:12+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:00 flows:1] => waiting
2022-05-13T12:54:12+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting job:00 flows:1] => waiting(queued)
2022-05-13T12:54:13+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(queued) job:00 flows:1] => waiting
2022-05-13T12:54:13+12:00 INFO - Queue released: 20220101T0000Z/model_varbc_reset
2022-05-13T12:54:13+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting job:00 flows:1] => preparing
2022-05-13T12:54:13+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] host=niwa-1007823l
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] (internal)submitted at 2022-05-13T12:54:14+12:00
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] submitted to localhost:background[26146]
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] => submitted
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset submitted job:01 flows:1] health: submission timeout=None, polling intervals=PT15M,...
2022-05-13T12:54:16+12:00 INFO - [20220101T0000Z/model_varbc_reset submitted job:01 flows:1] (received)started at 2022-05-13T12:54:15+12:00
2022-05-13T12:54:16+12:00 INFO - [20220101T0000Z/model_varbc_reset submitted job:01 flows:1] => running
2022-05-13T12:54:16+12:00 INFO - [20220101T0000Z/model_varbc_reset running job:01 flows:1] health: execution timeout=None, polling intervals=PT15M,...
2022-05-13T12:54:26+12:00 INFO - [20220101T0000Z/model_varbc_reset running job:01 flows:1] (received)succeeded at 2022-05-13T12:54:25+12:00
2022-05-13T12:54:26+12:00 INFO - [20220101T0000Z/model_varbc_reset running job:01 flows:1] => succeeded
...
2022-05-13T12:55:08+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:01 flows:1] => succeeded(runahead)
(Note I put script = sleep 10
in root).
Possibly caused by tasks drifting out of the n=0/1 window then drifting back in again at a later time?
Confirmed. The datastore is calling TaskProxy.state_reset()
, which actions and logs state changes, on loading task history from the DB ... which isn't a "real" state change.
(Note I took a look at this with some urgency, in case it indicated a problem with the scheduling algorithm (fortunately it didn't).
If I run the following workflow:
then I get the following entries in the workflow log:
These are very confusing. Given that succeeded(runahead) isn't an expected state, I think this can be regarded as a bug.
See also #3647.