cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

Bare bones support for introspection of suite definitions #3843

Open TomekTrzeciak opened 4 years ago

TomekTrzeciak commented 4 years ago

The ultimate solution to this is Python API to Cylc #1962, but in the meantime it would really help to have at least some basic form of introspection and manipulation available to work with suite definitions in code rather than by splicing text fragments.

Here's a non-committal proposal that can be ditched as soon as jinja2 becomes a thing of the past - expose parsec module as two jinja2 filters for parsing and (re)dumping:

{% filter cylc_parse | some_user_filter | cylc_dump %}
[schedulling]
...
{% endfilter %}

{% set rt | cylc_parse %}
[runtime]
...
{% endset %}

{{ rt | some_other_filter | cylc_dump }}

I think this would be already enough to satisfy the basic needs, cf. https://github.com/cylc/cylc-flow/issues/1962#issuecomment-310492269.

The benefit of tying it into jinja and the current ini format is that it's easy to ditch when moving to Python API or another format (like YAML) in the future.

oliver-sanders commented 4 years ago

IMO these issues are best solved in a layer above the configuration in pure Python using template processing only to dump the results. I'm reluctant to pile up the Jinja2 hacks trying to turn a templating engine into a programming language because it's just never going to cut the mustard.

The Cylc configuration is just a nested dictonary, you can construct it in Python using the suite.rc as a serialisation, in-fact amusingly Cylc8 contains code to pretty print nested dicts as cylc configurations:

suite = {
    'scheduling': {
        'dependencies': {
            'R1' {
                'graph': '''
                    foo => bar => baz
                '''
            }
        }
    }
}

It's the secret Python API that Cylc has had for years!

It's not the approach for everyone, but when the Jinja2 starts to outweight suite.rc in a suite...

hjoliver commented 4 years ago

(@oliver-sanders - are you suggesting that @TomekTrzeciak actually use the above in lieu of the Cylc 9 API?)

TomekTrzeciak commented 4 years ago

IMO these issues are best solved in a layer above the configuration in pure Python using template processing only to dump the results. I'm reluctant to pile up the Jinja2 hacks trying to turn a templating engine into a programming language because it's just never going to cut the mustard.

I do not dispute that full-fledged Python API is the way to go as I point out in the description, but that's something earmarked for Cylc9, so a few years down the road at best. In the meantime we have to suffer from Jinja2 and anything that alleviates this pain is worth considering IMO, especially if it can be done with little implementation and maintenance effort as is the case here.

What I'm proposing here can be implemented with very little code and, while it won't be a game changer for suite design, it can prove quite a lot more powerful. In fact, I've written such filters in the past, but:

  1. I have to rely on non-public APIs for that (parsec).
  2. The current parsec code needs a bit of refactor (it currently requires a named file on disk in order to work - it should be able to take just a string or StringIO object instead).

The Cylc configuration is just a nested dictionary, you can construct it in Python using the suite.rc as a serialisation

Still, it would be nice to use serialisation routine from Cylc rather than roll my own, as simple as it may be.

And this doesn't solve the problem entirely. There might be parts of the suite that need to stay in suite.rc format because they are maintained or required by other team(s) (e.g. operations). I cannot expect all my suite users to switch over to Python, in fact I might not be even able to make that call (again, think operations). So I need to integrate the Jinja parts that must be there with the Python generated parts and currently this is quite awkward.

hjoliver commented 4 years ago

What I'm proposing here can be implemented with very little code and, while it won't be a game changer for suite design, it can prove quite a lot more powerful.

Seems quite a compelling argument to me, but I need to think about it a bit more (and will chat with @oliver-sanders ).

oliver-sanders commented 4 years ago

I really don't think that we should try to hack any more programatic logic into the Cylc configuration file, there's only so much we can do with this templating environment and it's bad enough as it is. It would be another way to write increasingly convoluted suites that I'm not desperately happy to support, especially if you want to take this operational! It may be easy to implement but we will have to maintain and support this for years to come.

The main cases for introspection we've discussed in the past have been in the scheduling section (e.g. identifying dangling graph leaves, tying together subgraphs, input-output matching, etc). This section is already more program than configuration, operators don't have the option to edit it and it's too complex to understand in written form anyway.

So this is my logic for suggesting that Cylc files are not the place to solve this problem. The more we do with the suite.rc file the less like a suite.rc file it becomes.

What's the driving the demand for introspection in this case? Would be good for us to capture these problems in issues to help advise Cylc development.

TomekTrzeciak commented 4 years ago

I really don't think that we should try to hack any more programmatic logic into the Cylc configuration file, there's only so much we can do with this templating environment and it's bad enough as it is. It would be another way to write increasingly convoluted suites that I'm not desperately happy to support, especially if you want to take this operational! It may be easy to implement but we will have to maintain and support this for years to come.

So the argument goes: no power toys, because users can't be trusted to play nice with them 😉 . Fair enough, although I don't really understand what you're against so much here. Is it because of the Jinja aspect of it? Let's forget about the Jinja then. What I'm really after is a documented way to read and write suite.rc format. Would you be willing to provide that? Not sure what your plans for Cylc 9 API are, but I would imagine they should cover reading and writing config files as the very basics. I'm just asking if this part could happen ahead of Cylc 9?

Now, I can work around these things for myself in one way or another. I've already written a config dumping routine (~50 lines of code, so no big deal). I could use configobj for suite.rc parsing and then type-coerce items of interest to me. But then I thought that having this supported by Cylc makes so much more sense than me reinventing the wheel.

The main cases for introspection we've discussed in the past have been in the scheduling section (e.g. identifying dangling graph leaves, tying together subgraphs, input-output matching, etc). This section is already more program than configuration, operators don't have the option to edit it and it's too complex to understand in written form anyway.

This is very true, but for practical reasons I would consider graphs out of scope for this issue. There is simply no suitable graph representation in Cylc code base at present (parsec keeps the graph as text and graph_parser is not easily usable, because it entangles graph parsing with parameter expansion).

So this is my logic for suggesting that Cylc files are not the place to solve this problem. The more we do with the suite.rc file the less like a suite.rc file it becomes.

As someone who's been working on programmatically generated suites for the last few years (in Jinja2 and outside of it) I can assure you that, contrary to your fears, working with data structures instead of text affords cleaner and less hacky solutions. It is blocking off these kind of avenues that leads to the worst hacks, because you're forced to solve complex problems with limited tooling.

What's the driving the demand for introspection in this case? Would be good for us to capture these problems in issues to help advise Cylc development.

Achieving better patterns and separation of concerns in suite design. Currently, to break the suite definition into multiple pieces you often need a bunch of bespoke Jinja2 variables to keep track of common parts (like clock triggers). These variables then form an informal interface to abide by. This gets worse when pieces of the suite are maintained in different teams or even organisations and possibly in different repositories. I don't want to say that the introspection is a replacement for the Jinja2 stuff (it has its place and uses too), but it gives suite developers a choice to target some common ground (Cylc config) directly rather than via arbitrary set of bespoke Jinja2 variables. This gets even more important if part of the suite is code generated.

Here's a quick laundry list of things we currently do in our suite generation framework:

Planned:

Some of these things like gathering of clock-triggers, enforcing inheritance patterns and naming conventions, custom validation (e.g. detect naming conflicts between different parts of suite definition), could all be extended to other (non-generated) parts of the suite. At present, often the best you can do is to agree on some common conventions and keep your fingers crossed that everyone follows. It is so much better to enforce in code as much of it as possible then to rely on people to do the right thing.

hjoliver commented 4 years ago

I could use configobj for suite.rc parsing and then type-coerce items of interest to me.

Are you aware that Cylc used configobj once but ditched it way back in 2013? Reasons here https://github.com/cylc/cylc-flow/issues/562#issue-16455139 suggest it probably isn't feasible to parse current suite.rc with configobj?