Open oliver-sanders opened 9 months ago
Just tried with the "backward" recurrence alone:
INFO - Cylc version: 8.3.0.dev
INFO - Run mode: live
INFO - Initial point: 2000
INFO - Final point: 2010
INFO - Cold start from 2000
INFO - New flow: 1 (original flow from 2000) 2024-01-31 00:41:23
DEBUG - Runahead: base point 2008
DEBUG - Runahead limit: 2010
(Folllowed by immediate shutdown)
So this is runahead-related in the sense that at start-up, when there are no tasks in the pool, we compute the runahead limit using sequence points alone - something going wrong there it seems.
Looks like it is a problem with the pre-initial dependency (if that's the right term). Adding in R1//2007 = b
or R1/2007 = b
allows the sequence to start
Yeah, I think I understand it ...
Actually, I'm kinda surprised this worked in Cylc 7. (Did it?)
The pre-initial dependency handling only ever (as I recall) applied to the initial cycle point of the workflow, not individual sequence start points.
That rings a bell ...
Looks like I flagged this ages ago:
https://github.com/cylc/cylc-flow/issues/1936
And see the final comment from 2 years ago (SoD):
https://github.com/cylc/cylc-flow/issues/1936#issuecomment-1031825906
Ah yes, if you change the start of the format 3 recurrence (the "forwards" example) to a point beyond the ICP e.g. R3/2003/P1Y = f[-P1Y] => f
, then you get the same immediate shutdown.
Likewise if you change the end of the format 4 recurrence (the "backwards" example) to 2002 then the b
jobs run
Right, so as I understand it that's the expected result - hence my issue above suggesting it would be nice to apply "pre-initial" logic to individual sequences not just the workflow initial cycle point.
The "workaround" is simply to not expect magical bootstrapping into an intercycle dependency, but handle it explicitly.
Explicit works fine:
[[graph]]
# count backwards
R3/P1Y/2010 = b[-P1Y] => b
# boostrap into the sequence
R1/2007/P0Y = b
Actually, I'm kinda surprised this worked in Cylc 7. (Did it?)
Yes (as above)
Well the lone "backward" one doesn't work:
._.
| | The Cylc Suite Engine [7.9.9]
._____._. ._| |_____. Copyright (C) 2008-2019 NIWA
| .___| | | | | .___| & British Crown (Met Office) & Contributors.
| !___| !_! | | !___. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____! This program comes with ABSOLUTELY NO WARRANTY;
.___! | see `cylc warranty`. It is free software, you
!_____! are welcome to redistribute it under certain
2024-01-31T01:21:08+13:00 INFO - Suite server: url=http://NIWA-1022450.niwa.local:43099/ pid=19403
2024-01-31T01:21:08+13:00 INFO - Run: (re)start=0 log=1
2024-01-31T01:21:08+13:00 INFO - Cylc version: 7.9.9
2024-01-31T01:21:08+13:00 INFO - Run mode: live
2024-01-31T01:21:08+13:00 INFO - Initial point: 2000
2024-01-31T01:21:08+13:00 INFO - Final point: 2010
2024-01-31T01:21:08+13:00 INFO - Cold Start 2000
2024-01-31T01:21:09+13:00 WARNING - suite stalled
2024-01-31T01:21:09+13:00 WARNING - Unmet prerequisites for b.2008:
2024-01-31T01:21:09+13:00 WARNING - * b.2007 succeeded
Neither does your full example - so I'm confused!
oliverh@NIWA-1022450:~/cylc-src/dog$ cylc run --no-detach dog
._.
| | The Cylc Suite Engine [7.9.9]
._____._. ._| |_____. Copyright (C) 2008-2019 NIWA
| .___| | | | | .___| & British Crown (Met Office) & Contributors.
| !___| !_! | | !___. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____! This program comes with ABSOLUTELY NO WARRANTY;
.___! | see `cylc warranty`. It is free software, you
!_____! are welcome to redistribute it under certain
2024-01-31T01:23:30+13:00 INFO - Suite server: url=http://NIWA-1022450.niwa.local:43028/ pid=19486
2024-01-31T01:23:30+13:00 INFO - Run: (re)start=0 log=1
2024-01-31T01:23:30+13:00 INFO - Cylc version: 7.9.9
2024-01-31T01:23:30+13:00 INFO - Run mode: live
2024-01-31T01:23:30+13:00 INFO - Initial point: 2000
2024-01-31T01:23:30+13:00 INFO - Final point: 2010
2024-01-31T01:23:30+13:00 INFO - Cold Start 2000
2024-01-31T01:23:30+13:00 INFO - [f.2000] -submit-num=01, owner@host=NIWA-1022450.niwa.local
2024-01-31T01:23:31+13:00 INFO - [f.2000] status=ready: (internal)submitted at 2024-01-31T01:23:31+13:00 for job(01)
2024-01-31T01:23:31+13:00 INFO - [f.2000] -health check settings: submission timeout=None
2024-01-31T01:23:31+13:00 INFO - [f.2000] status=submitted: (received)started at 2024-01-31T01:23:31+13:00 for job(01)
2024-01-31T01:23:31+13:00 INFO - [f.2000] -health check settings: execution timeout=None
2024-01-31T01:23:33+13:00 INFO - [f.2000] status=running: (received)succeeded at 2024-01-31T01:23:33+13:00 for job(01)
2024-01-31T01:23:34+13:00 INFO - [f.2001] -submit-num=01, owner@host=NIWA-1022450.niwa.local
2024-01-31T01:23:35+13:00 INFO - [f.2001] status=ready: (internal)submitted at 2024-01-31T01:23:35+13:00 for job(01)
2024-01-31T01:23:35+13:00 INFO - [f.2001] -health check settings: submission timeout=None
2024-01-31T01:23:35+13:00 INFO - [f.2001] status=submitted: (received)started at 2024-01-31T01:23:35+13:00 for job(01)
2024-01-31T01:23:35+13:00 INFO - [f.2001] -health check settings: execution timeout=None
2024-01-31T01:23:37+13:00 INFO - [f.2001] status=running: (received)succeeded at 2024-01-31T01:23:37+13:00 for job(01)
2024-01-31T01:23:38+13:00 INFO - [f.2002] -submit-num=01, owner@host=NIWA-1022450.niwa.local
2024-01-31T01:23:39+13:00 INFO - [f.2002] status=ready: (internal)submitted at 2024-01-31T01:23:39+13:00 for job(01)
2024-01-31T01:23:39+13:00 INFO - [f.2002] -health check settings: submission timeout=None
2024-01-31T01:23:39+13:00 INFO - [f.2002] status=submitted: (received)started at 2024-01-31T01:23:39+13:00 for job(01)
2024-01-31T01:23:39+13:00 INFO - [f.2002] -health check settings: execution timeout=None
2024-01-31T01:23:41+13:00 INFO - [f.2002] status=running: (received)succeeded at 2024-01-31T01:23:41+13:00 for job(01)
2024-01-31T01:23:43+13:00 WARNING - suite stalled
2024-01-31T01:23:43+13:00 WARNING - Unmet prerequisites for b.2008:
2024-01-31T01:23:43+13:00 WARNING - * b.2007 succeeded
(Anyhow, I gotta bail, it's late here ... I'll check follow-up comments in the morning).
Cylc 7 stall with unsatisfied pre-spawned tasks, vs Cylc 8 shutdown with nothing to do - is expected under the circumstances - but I presume by "it worked at Cylc 7" you mean it actually ran, not that it immediately stalled.
Well the lone "backward" one doesn't work:
That's because of the pre-initial dependency which I've written up as a separate issue, see https://github.com/cylc/cylc-flow/issues/5946
This can be worked around as you've observed.
So to uncross wires (as #5946 is the same issue as this one), are the real issues here #1936 and #4638?
I haven't had the time to investigate this yet to know. I suspect yes and maybe.
OK, sorry if I didn't read the fine print:
Cylc 7: Spawns both the f and b chains.
I guess I over-interpreted this, and that it "worked in Cylc 7", to mean both chains actually run in Cylc 7, which they don't.
We should certainly consider making this more obvious or flexible for users (hence the old issues #1936 and #4638) but technically the current Cylc 8 behaviour is correct and not a bug.
R3/P1Y/2010 = b[-P1Y] => b # with ICP = 2000 say
This literally says:
2008/b
IF 2007/b
succeedsb
does not exist for points < 2008
Plus: automatic bootstrapping into an inter-cycle dependency is a convenience (not a requirement) that only applies to the ICP, not to individual recurrences.
Therefore, under current well-defined rules of engagement the user has probably just made a configuration error such that there is literally nothing to run in that part of the graph.
So on that basis this is a "could be better" rather than a bug, and once we've decided on the approach we should consolidate this issue with one of the older ones:
Follow up question:
Should we even allow a workflow to be started if it contains unreachable sections of graph?
All functional Cylc 7 workflows will pass this test.
Should we even allow a workflow to be started if it contains unreachable sections of graph?
Probably not, but (obviously) achieving that requires being able to detect that at validation.
All functional Cylc 7 workflows will pass this test.
Which test? You can certainly start a Cylc 7 workflow that contains unreachable graph: any inter-cycle dependence that isn't automatically bootstrapped by pre-initial-ignore:
[scheduling]
cycling mode = integer
initial cycle point = 1
[[dependencies]]
[[[R1]]]
graph = "foo[-P1] => foo" # OK
[[[R1/2/P1]]]
graph = "bar[-P1] => bar" # Uh-oh, unreachable.
All functional Cylc 7 workflows will pass this test.
Which test?
The test of not having an unreachable tasks.
By functional I was ruling out workflows with broken graphs that will stall when run.
Cylc 8 may silently ignore some format 4 recurrences.
E.G:
Cylc 8: Spawns the
f
chain of tasks and shuts down on cycle 2003 Cylc 7: Spawns both thef
andb
chains.See also https://github.com/cylc/cylc-flow/issues/5946