cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

Spin-up of tasks with inter-cycle dependencies #3903

Open TomekTrzeciak opened 4 years ago

TomekTrzeciak commented 4 years ago

At start-up any task dependencies that would resolve to a cycle point before the initial one are effectively treated as succeeded. This has the effect of pushing the logic to deal with the initial task spin-up into the application layer.

It would be convenient to have an option to automatically delay task start-up until a cycle, when all its dependencies can be met.

This can be currently achieved manually to a certain extent (edit: fixed recurrence definitions):

[[[min(T00,T12)/PT12H]]]
graph = foo

[[[min(T00+PT12H,T12+PT12H)/PT12H]]]
graph = foo[-PT12H] => bar

[[[min(T00+PT24H,T12+PT24H)/PT12H]]]
graph = bar[-PT12H] => baz

With more complex graphs, however, this kind of method gets pretty unwieldy pretty quickly.

oliver-sanders commented 4 years ago

The only simple solution I can think of would be to have a switch for changing the global behaviour.

TomekTrzeciak commented 4 years ago

The only simple solution I can think of would be to have a switch for changing the global behaviour.

This might run into problems with tasks that depend on themselves like housekeep[-PT1H] => housekeep. There likely isn't a simple solution here [*], but it would be useful to solve this nonetheless.

[*] If you would allow this to be set on per task basis, it raises questions of cumulative build up of spin-up delays along the dependency chain and transitive dependencies via non-delayed tasks.

oliver-sanders commented 4 years ago

Not sute this is a proper solution to your problem as I suspect your example is somewhat simplified 😀, however, for the record a combination of the previous/next syntax and recurrence exclusions can be used to abstract out the min(T00, T12):

initial cycle point = previous(T00,T12)

[[[PT12H ! ^]]]
graph = foo

[[[PT12H ! ^ +PT12H]]]
graph = foo[-PT12H] => bar

[[[PT12H ! ^ +PT24H]]]
graph = bar[-PT12H] => baz

You still have to follow the inter-cycle dependencies through yourself which is obviously a problem for generated graphs.

oliver-sanders commented 4 years ago

Beyond that we would need to have some special syntax e.g:

[[[PT12H/min(T00,T12)]]]
graph = """
   foo
   foo[!-PT12H] => bar
   bar[!-PT12H] => baz
"""

Where the ! tells Cylc not to spawn the task across the initial cycle point boundary.

I think if we patched the pre-initial logic under SoD, downstream tasks would not be run unless they have another upstream task, i.e. in this example b and c would not run for the first cycle and c would not run for the second:

[[[PT12H/min(T00,T12)]]]
graph = """
   foo => a
   bar => b
   baz => c
"""

Which seems promising, too late in the day for me to wrap my head around the consequences...

TomekTrzeciak commented 4 years ago

Not sute this is a proper solution to your problem as I suspect your example is somewhat simplified 😀, however, for the record a combination of the previous/next syntax and recurrence exclusions can be used to abstract out the min(T00, T12):

initial cycle point = previous(T00,T12)

[[[PT12H ! ^]]]
graph = foo

[[[PT12H ! ^ +PT12H]]]
graph = foo[-PT12H] => bar

[[[PT12H ! ^ +PT24H]]]
graph = bar[-PT12H] => baz

Clever, but this doesn't have the same effect as in the example ( bar still runs at initial cycle and baz runs at ^ and ^+PT12H).

oliver-sanders commented 4 years ago

Ah, so you would need to add one exclusion per recurrence:

initial cycle point = previous(T00,T12)

[[[PT12H ! ^]]]
graph = foo

[[[PT12H ! (^, ^+PT12H)]]]
graph = foo[-PT12H] => bar

[[[PT12H ! (^, ^+PT12H, +PT24H)]]]
graph = bar[-PT12H] => baz

Or shield them with an ICP offset something like this:

initial cycle point = previous(T00,T12)

[[[PT12H ! ^]]]
graph = foo

[[[^+PT12H/PT12H]]]
graph = foo[-PT12H] => bar

[[[^+PT24H/PT12H]]]
graph = bar[-PT12H] => baz
hjoliver commented 4 years ago

At start-up any task dependencies that would resolve to a cycle point before the initial one are effectively treated as succeeded. This has the effect of pushing the logic to deal with the initial task spin-up into the application layer.

@TomekTrzeciak and @oliver-sanders - I haven't read this discussion in detail, but are we in danger of introduce yet more complex logic and settings to get around just doing it properly in the graph - i.e. you don't have to rely on ignoring pre-initial dependence, for initial task spin-up. This will be stating the obvious to you guys, but for the record - instead of this:

    initial cycle point = 1
    [[dependencies]]
         [[[P1]]]
             graph = "foo[-P1] => foo"

, this:

    [[[R1/^]]]
          graph = foo
    [[[R/2/P1]]]
          graph = "foo[-P1] => foo"