cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

mixed parentless/non-parentless task cause premature shutdown #5730

Open dwsutherland opened 1 year ago

dwsutherland commented 1 year ago

Description

I ran into this problem while thinking of ways to break my inbound parentless sequential wall clock task spawning (which I may work the solution into). A workflow with a mixed parentless/not-parentless task on different cycles causes/may-cause premature workflow shutdown.

Reproducible Example

i.e. this workflow shutdowns after 2015 has run

[scheduler]
    cycle point format = CCYY
[scheduling]
    initial cycle point = 2010
    [[xtriggers]]
        clock_1 = wall_clock()
    [[graph]]
        P2Y = """
@clock_1 => a
a => b
"""
        +P1Y/P2Y = """
a => b
b[-P1Y] => a
"""

[runtime]
    [[root]]
        script = sleep 5
    [[a,b]]

(clock trigger not really needed, stalls without)

Expected Behaviour

Workflow should never stop.

Discussion

I think the trouble occurs when a non-parentless task ends up the final or next at the runahead limit.

A possible solution is to check for non-spawned parentless successor (the next occurrence) of a task that enters the active pool (from initial spawn or RH pool).. Perhaps we can narrow down the checking somehow (i.e. those who could possibly be parentless, via config or w/e)..

hjoliver commented 1 year ago

I think the clock trigger is unnecessary, to reproduce this @dwsutherland ?

hjoliver commented 1 year ago

And for me, it shuts down prematurely after the 2015 point, but I haven't seen it stall. (Mind you, premature shut down is even worse!)

hjoliver commented 1 year ago

tmpnxyyh8ts

For reference. Fortunately this sort of alternating parented/parentless structure is probably unlikely in real workflows.

dwsutherland commented 1 year ago

I think the clock trigger is unnecessary, to reproduce this @dwsutherland ?

Yes, mentioned that

hjoliver commented 1 year ago

Sorry, so you did! 🤦

dwsutherland commented 1 year ago

And for me, it shuts down prematurely after the 2015 point, but I haven't seen it stall. (Mind you, premature shut down is even worse!)

Yes, sorry, fixed description

hjoliver commented 1 year ago

My minimal example:

[scheduling]
    cycling mode = integer
    [[graph]]
        P2 = "a => b"
        2/P2 = """
           a => b
           b[-P1] => a
         """
[runtime]
    [[a,b]]

Actually, even this:

[scheduling]
    cycling mode = integer
    [[graph]]
        P2 = "a"
        2/P2 = "a[-P1] => a"
[runtime]
    [[a]]

These both shut down after point 6.

tmp9saa81il

hjoliver commented 1 year ago

I think the trouble occurs when a non-parentless task ends up the final or next at the runahead limit.

I think you might be right there...