Support imported suite modules.

hjoliver commented 8 years ago

This supersedes #173.

[UPDATE] Description edited to clarify the basic concept - which is sufficient for all use cases I know of - and to remove (my) unnecessary speculation beyond that (sorry if this removes context for some comments below!)

We should support imported suite modules to allow sharing - instead of duplicating - of common (sub-)workflows.

Concept

A suite module

defines one or more workflows that can be imported into other suites, e.g. a build workflow to be imported into the R1 graph of another suite, and a processing workflow to be imported into its cycling graph.
can be stored like a normal suite (and may contain bin scripts and other files for use by its tasks)
(it may be a valid non-cycling suite by itself, for standalone testing purposes)
defines a module interface that allows it to be treated as a black box by the importing suite, e.g.:
- inputs for its initial tasks,
- output locations for its final products,
- (some internal configurability could also be exposed to the importing suite)
  Proposed Interface

%import MODULE as NAME
[scheduling]
    [[dependencies]]
         graph = get-data => proc-data => NAME
[runtime]
    [[NAME]]
         # (an externally defined workflow containing many tasks)
         [[[module interface]]]
               INPUT_DIR = $CYLC_SUITE_SHARE_DIR
               OUTPUT_DIR = $CYLC_SUITE_SHARE_DIR
               # etc.

When this suite is parsed, the module

will be installed into the suite from its source location
its [runtime] will be added to that of the importing suite,
- with names qualified by "NAME" to avoid clashes,
- and inside a family called "NAME"
its graph will be inserted into the suite graph, also with qualified task names
(at run time the full module workflow is present, inside a collapsible family).

arjclark commented 8 years ago

Things that are occurring to me as to be thought about:

Can (or should) the importer supply or override module runtime config such as task hosting details, or could this be made part of the module interface, or should we just accept that modules aren't portable in that way? (perhaps the latter, at least initially).

Additionally, if something is changed in the imported module, how do we handle that in terms of reloads and restarts? Possible situations being:

file in some central location has been changed (non-revision based solution), I want to reload my suite, how does that work?
file in some central location has changed (again, non-revision based solution), I want to restart my suite but not pick up changes to that file, just continue as was.

How do we go about validating the imported suite - as "In terms of suite design, the module workflow looks to the importing suite like a single task (albeit without some of the usual runtime settings)." - so how would we capture validating graphing in the imported suite to avoid cases where it might be invalid and would break things at runtime?

Do you envisage us being able to work with graphing brought in from the module in a standard way? i.e. can we do things like PRODUCTS.FAM_FOO => fail-any => some_task and/or PRODUCTS.foo:fail => recovery_task_specific_to_my_suite => !PRODUCTS.foo

Do we need to do something special to avoid:

[runtime]
[[PRODUCTS]]

being mixed up with the result of:

%import <location>:<module-name>@<revision-tag> as PRODUCTS

matthewrmshin commented 8 years ago

Can a module run as a standalone suite, e.g. during development and testing?

what to do with the module interface in a standalone run?

support an internal cycling section that is ignored/replaced by the importing suite?

We can do something like Python's if __name__ == "__main__" kind of logic.

matthewrmshin commented 8 years ago

Additionally, if something is changed in the imported module, how do we handle that in terms of reloads and restarts? Possible situations being:

file in some central location has been changed (non-revision based solution), I want to reload my suite, how does that work?

file in some central location has changed (again, non-revision based solution), I want to restart my suite but not pick up changes to that file, just continue as was.

If you want ultimate control of suite configuration, you should only import from version controlled location, or locations that you have full control of.

However, it is sometimes advantageous to import from a moving HEAD or latest stable. In which case:

The suite reloads only if the user has to tell the suite to reload, so it should not be problematic. I would imagine that the full running suite configuration will be stored under log/suiterc/ like we do now, so it should be clear what we are running with.
Restart is problematic under the current logic. In fact, I think we should modify the current logic so that a restart does not automatically loads the suite.rc from the registered location, but should loads the relevant suite.rc from the log/suiterc/ location, unless a reload is also requested for the restart.

arjclark commented 8 years ago

Restart is problematic under the current logic. In fact, I think we should modify the current logic so that a restart does not automatically loads the suite.rc from the registered location, but should loads the relevant suite.rc from the log/suiterc/ location, unless a reload is also requested for the restart.

Which I guess means that we'll need to keep a copy of the processed out contents of the imported content in there rather than a reference to some external (possibly processed) suite files.

matthewrmshin commented 8 years ago

%import <location>:<module-name>@<revision-tag> as PRODUCTS

We should probably support a search path + import from relative path mechanism - so that the import statement can be detached from site-specific locations.

hjoliver commented 8 years ago

@arjclark (above https://github.com/cylc/cylc/issues/1829#issuecomment-217096124):

if something is changed in the imported module, how do we handle that in terms of reloads and restarts? ... (non-revision based solution)

I suppose we could design a solution such that restart and reload do not re-import by default, but in my view we shouldn't do that. If using a "non-revision based solution" (or similarly, extracting from HEAD of a branch in other contexts) you get what you paid for - so be aware, and be careful!

How do we go about validating the imported suite ...

By "in terms of suite design" I mean all the suite writer has to worry about assuming that imported modules are not broken or incompatible with the importer. I suppose we could have a validation that doesn't actually do the import, but I was envisaging that validation would do a full suite parse as for a suite start-up, in which case a broken module would cause the validation to fail (c.f. importing a broken Python module).

Do you envisage us being able to work with graphing brought in from the module in a standard way? ...

Yes, although I haven't thought about this in much depth. As described above, the imported module does end up as a regular task family with internal dependencies, so MODULE:succeed-all => foo would actually be equivalent to triggering off the final internal task(s), without having to know (at suite design time) the internal task names.

Do we need to do something special to avoid...

Yes - abort if a name collision is detected.

hjoliver commented 8 years ago

@matthewrmshin (above https://github.com/cylc/cylc/issues/1829#issuecomment-217156162):

Restart is problematic under the current logic. ...

Good point. We also need to take into account our plans to move the "rose suite-run" installed-suite paradigm into cylc, which separates the installed/registered location from the source location.

hjoliver commented 8 years ago

Does import MODULE-URL@MODULE-REV as NAME inside a suite.rc file seem a little odd given that cylc suites have hitherto only been objects of (/subject to) revision control, not actually aware of the concept themselves? Maybe we should only support import MODULE as NAME and require that the location (and potentially revision) of MODULE is somehow provided by external means ?

arjclark commented 8 years ago

Does import MODULE-URL@MODULE-REV as NAME inside a suite.rc file seem a little odd given that cylc suites have hitherto only been objects of (/subject to) revision control, not actually aware of the concept themselves?

Yes, to me at least, and kind of prompted my questions about things in relation to non-revision controlled files i.e. ones where you can't use a @MODULE-REV extension.

matthewrmshin commented 8 years ago

(I am thinking about a system similar to Python's import statement and sys.path, or FCM's configuration file's include and include-path.)

hjoliver commented 8 years ago

Module source locations:

This should wait on migration of rose suite-run #1885. Then, the suite.rc syntax should only support import MODULE as NAME, with MODULE expected to be located in a local sub-directory of the suite directory, modules/MODULE say, or perhaps externally via $CYLC_MODULE_PATH or similar. For revision controlled modules, some cylc file analogous to rose-suite.conf can specify URL+revision of the module, to be extracted and installed into the suite installation.

hjoliver commented 8 years ago

[meeting] - we agreed on the proposal as described in the updated issue description, plus previous comment on source locations.

An additional thought was:

don't need to wait on #1885 for the initial implementation, using modules in a local sub-directory
in fact a module could be imported from a suite (where it exists in the modules sub-directory)
- so you could modularize an existing suite without actually taking the module config out of it, but still make it available to others
- this automatically solves the problem of standalone testing discussed above, without requiring additional cycling config etc. that is ignored on import: just store a module within a simple suite designed for standalone testing.

hjoliver commented 5 years ago

Closing this issue: the proposed mechanism worked reasonably well within the confines of the config file format, but we decided quite some time ago to wait on the future Python API to get proper modularity via Python itself.

cylc / cylc-flow

Support imported suite modules. #1829

Concept

Proposed Interface