cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

Support imported suite modules. #1829

Closed hjoliver closed 5 years ago

hjoliver commented 8 years ago

This supersedes #173.

[UPDATE] Description edited to clarify the basic concept - which is sufficient for all use cases I know of - and to remove (my) unnecessary speculation beyond that (sorry if this removes context for some comments below!)

We should support imported suite modules to allow sharing - instead of duplicating - of common (sub-)workflows.

Concept

A suite module

%import MODULE as NAME
[scheduling]
    [[dependencies]]
         graph = get-data => proc-data => NAME
[runtime]
    [[NAME]]
         # (an externally defined workflow containing many tasks)
         [[[module interface]]]
               INPUT_DIR = $CYLC_SUITE_SHARE_DIR
               OUTPUT_DIR = $CYLC_SUITE_SHARE_DIR
               # etc.

When this suite is parsed, the module

arjclark commented 8 years ago

Things that are occurring to me as to be thought about:

Can (or should) the importer supply or override module runtime config such as task hosting details, or could this be made part of the module interface, or should we just accept that modules aren't portable in that way? (perhaps the latter, at least initially).

Additionally, if something is changed in the imported module, how do we handle that in terms of reloads and restarts? Possible situations being:

How do we go about validating the imported suite - as "In terms of suite design, the module workflow looks to the importing suite like a single task (albeit without some of the usual runtime settings)." - so how would we capture validating graphing in the imported suite to avoid cases where it might be invalid and would break things at runtime?

Do you envisage us being able to work with graphing brought in from the module in a standard way? i.e. can we do things like PRODUCTS.FAM_FOO => fail-any => some_task and/or PRODUCTS.foo:fail => recovery_task_specific_to_my_suite => !PRODUCTS.foo

Do we need to do something special to avoid:

[runtime]
[[PRODUCTS]]

being mixed up with the result of:

%import <location>:<module-name>@<revision-tag> as PRODUCTS
matthewrmshin commented 8 years ago

Can a module run as a standalone suite, e.g. during development and testing?

  • what to do with the module interface in a standalone run?
  • support an internal cycling section that is ignored/replaced by the importing suite?

We can do something like Python's if __name__ == "__main__" kind of logic.

matthewrmshin commented 8 years ago

Additionally, if something is changed in the imported module, how do we handle that in terms of reloads and restarts? Possible situations being:

  • file in some central location has been changed (non-revision based solution), I want to reload my suite, how does that work?
  • file in some central location has changed (again, non-revision based solution), I want to restart my suite but not pick up changes to that file, just continue as was.

If you want ultimate control of suite configuration, you should only import from version controlled location, or locations that you have full control of.

However, it is sometimes advantageous to import from a moving HEAD or latest stable. In which case:

arjclark commented 8 years ago

Restart is problematic under the current logic. In fact, I think we should modify the current logic so that a restart does not automatically loads the suite.rc from the registered location, but should loads the relevant suite.rc from the log/suiterc/ location, unless a reload is also requested for the restart.

Which I guess means that we'll need to keep a copy of the processed out contents of the imported content in there rather than a reference to some external (possibly processed) suite files.

matthewrmshin commented 8 years ago

%import <location>:<module-name>@<revision-tag> as PRODUCTS

We should probably support a search path + import from relative path mechanism - so that the import statement can be detached from site-specific locations.

hjoliver commented 8 years ago

@arjclark (above https://github.com/cylc/cylc/issues/1829#issuecomment-217096124):

if something is changed in the imported module, how do we handle that in terms of reloads and restarts? ... (non-revision based solution)

I suppose we could design a solution such that restart and reload do not re-import by default, but in my view we shouldn't do that. If using a "non-revision based solution" (or similarly, extracting from HEAD of a branch in other contexts) you get what you paid for - so be aware, and be careful!

How do we go about validating the imported suite ...

By "in terms of suite design" I mean all the suite writer has to worry about assuming that imported modules are not broken or incompatible with the importer. I suppose we could have a validation that doesn't actually do the import, but I was envisaging that validation would do a full suite parse as for a suite start-up, in which case a broken module would cause the validation to fail (c.f. importing a broken Python module).

Do you envisage us being able to work with graphing brought in from the module in a standard way? ...

Yes, although I haven't thought about this in much depth. As described above, the imported module does end up as a regular task family with internal dependencies, so MODULE:succeed-all => foo would actually be equivalent to triggering off the final internal task(s), without having to know (at suite design time) the internal task names.

Do we need to do something special to avoid...

Yes - abort if a name collision is detected.

hjoliver commented 8 years ago

@matthewrmshin (above https://github.com/cylc/cylc/issues/1829#issuecomment-217156162):

Restart is problematic under the current logic. ...

Good point. We also need to take into account our plans to move the "rose suite-run" installed-suite paradigm into cylc, which separates the installed/registered location from the source location.

hjoliver commented 8 years ago

Does import MODULE-URL@MODULE-REV as NAME inside a suite.rc file seem a little odd given that cylc suites have hitherto only been objects of (/subject to) revision control, not actually aware of the concept themselves? Maybe we should only support import MODULE as NAME and require that the location (and potentially revision) of MODULE is somehow provided by external means ?

arjclark commented 8 years ago

Does import MODULE-URL@MODULE-REV as NAME inside a suite.rc file seem a little odd given that cylc suites have hitherto only been objects of (/subject to) revision control, not actually aware of the concept themselves?

Yes, to me at least, and kind of prompted my questions about things in relation to non-revision controlled files i.e. ones where you can't use a @MODULE-REV extension.

matthewrmshin commented 8 years ago

(I am thinking about a system similar to Python's import statement and sys.path, or FCM's configuration file's include and include-path.)

hjoliver commented 8 years ago

Module source locations:

This should wait on migration of rose suite-run #1885. Then, the suite.rc syntax should only support import MODULE as NAME, with MODULE expected to be located in a local sub-directory of the suite directory, modules/MODULE say, or perhaps externally via $CYLC_MODULE_PATH or similar. For revision controlled modules, some cylc file analogous to rose-suite.conf can specify URL+revision of the module, to be extracted and installed into the suite installation.

hjoliver commented 8 years ago

[meeting] - we agreed on the proposal as described in the updated issue description, plus previous comment on source locations.

An additional thought was:

hjoliver commented 5 years ago

Closing this issue: the proposed mechanism worked reasonably well within the confines of the config file format, but we decided quite some time ago to wait on the future Python API to get proper modularity via Python itself.