cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

external event triggers #1188

Closed hjoliver closed 9 years ago

hjoliver commented 10 years ago

We could allow tasks to trigger off external events completely generically - without having to support OS-specific things like inotify - by having task prerequisites that are not associated with another task, so they'd have to be satisfied by a cylc CLI call (or potentially, cylc internal API). This could be used to initiate some action in the suite without requiring a suite task that repeatedly polls for the external event (or using automatic task retries to do the polling). Sometimes polling may not be so convenient, e.g. to wait on an external dataset that arrives randomly and very infrequently, or to wait on a large number of external files individually.

This is essentially how message triggers work already, except that they have to come from another task. So we could already bodge this in with a dummy task that depends on message triggers (with messages emitted by an external system that, say, watches for a file). [credit: Joan Fernon at BoM].

So I propose formally supporting external event triggers, something like this:

graph =  <external_x>:event_1 & <external_x>:event_2 => foo
[runtime]
   [[external_x]]
      # (graph syntax identifies "external_1" as an event rather than a task)
      [[[outputs]]]
         event_1 = "dataset 1 arrived"
         event_2 = "dataset 2 arrived"

This would make foo depend on two messages as defined, coming from an external event that self-identifies as "external_x". But no task called "external_x" would be created in the suite.

@matthewrmshin et. al. - thoughts?

hjoliver commented 10 years ago

But no task called "external_1" would be created in the suite

A proxy object would still be created, to receive the event messages.

Some thought required on how to handle multiple instances of an event proxy in a cycling suite, if the event doesn't have an associated cycle point itself (thinking of the satellite data processing example).

matthewrmshin commented 10 years ago

Idea is good. (As usual, I'd expect some debate on the actual user interface + implementation. :wink:)

hjoliver commented 9 years ago

NIWA's satellite data processing will convert to cylc shortly. External event trigger seems to be a better way of handling this sort of thing because we can have data retrieval tasks that trigger when needed, instead of continuously polling at run time, or continually retrying until new data arrives. As a result, I thought through this issue today and banged up an initial implementation. I'll put up a pull request for consideration shortly.

matthewrmshin commented 9 years ago

The use case you have just described is quite similar to something our users would like to do. I'll look forward to the pull request.

hjoliver commented 9 years ago

I've revised the initial proposal above. Here's the reasoning that led to my prototype implementation:

We need a repeating workflow to handle a succession of arbitrarily timed datasets in real time:

Finally, in contrast to my original proposal above, external triggers should not appear as nodes in the graph because (a) they are not task proxies; and (b) they are essentially very similar to clock-triggers (we might eventually like to annotate graph nodes to indicate the presence of clock- and external triggers). [Update: opinion changed on this: https://github.com/cylc/cylc/issues/1364]