catalyst / moodle-tool_dataflows

A generic workflow and processing engine which can be configured to do a large variety of tasks.
GNU General Public License v3.0
12 stars 8 forks source link

Allow a step to be both a connector and a flow step and dynamically change depending on config #422

Open brendanheywood opened 2 years ago

brendanheywood commented 2 years ago

eg a curl step could be a connector and fire once if its upstream is a connector output. But if wired to a flow then it will fire once per row.

brendanheywood commented 2 years ago

See comments in https://github.com/catalyst/moodle-tool_dataflows/issues/377

brendanheywood commented 2 years ago

Was mulling on this and thinking if things could be radically simplified in order to achieve this.

Consider the situation if all connector steps accepted and / or produced iterators just like flow steps. But their iterator would only ever have on record in it. Functional that would be identical to how they are now.

A trigger step could produce a stream of events or just one.

But then consider how this works if you compose for example two reader steps, both which previously would start with a connector input and product a flow output. In the new scenario they would both start a flow of 1 and then when given a singleton record they produce N & M records from that single one. Which means if we stack then end to end then it's effectively like a cartesian product where now the resulting iterator has N x M records in it. That example may not be practical but it has a powerful elegance which appeals. Now almost all actions are both with essentially no code effort.

If you have a flow and you want to convert that to a connector then in this scenario I think you would need to have explicit flow caps and all they do is listen to the iterator and wait til they are drained and then they kick of a new iterator which only has the one record in it. All of the complexity around flowlet subgraphs disappears. The two state diagrams for flows vs connector converge into one. I even think this simplifies the way triggers could work

If you have a trigger that starts with a flow you can easily have steps which split off a side iterator which only fires once and it could do it with the first record or do it the last just as easily as it could by doing odds or evens.

If we do migrate to this conceptually I think we can do it all under the hood with very little public api changes. The step classes should not change for most of the steps. I'd everyone to really critique this concept please.

keevan commented 2 years ago

But then consider how this works if you compose for example two reader steps

I've always had a mental model that flow groups worked like for-loops, so this would be like a nested for-loop and potentially you could use the result from the outer loop to determine what to query for the inner loop (in the case of 2 readers). It might not make sense for 2 SQL readers, but maybe if the outer one came from some external JSON and the inner one was an SQL query which might produce 0 or more rows.

I think we can do it all under the hood with very little public api changes.

We probably can do it without changing much of how the flow is authored. I do think there are a number of things that should change in the public api (or certain aspects of it) but these can be done separately.

One in particular I'd like to have happen is this recently logged issue https://github.com/catalyst/moodle-tool_dataflows/issues/468

I think all current connector steps should have the potential to produce an iterator just like flow steps. Engine wise, I am not sure if they need to all be converted to flow like iterators, at least initially. I think the mechanics for configuring whether something flows or not (produces an iterable) once set in stone would still allow it to work with the existing - albeit complex - architecture. Then we can refine the internals without having to worry about that aspect once we hit it, but can still enjoy being able to use both current connectors inside flow steps and flow steps as connectors interchangeably sooner.

So if I were to put this down into some order of tasks (high level), it could be:

This is more or less of a thought dump and it might be easier to see where to go once the steps are taken, since they need to be taken at one point or another regardless.

keevan commented 2 years ago

Probably somewhere in this mix, it would be good to clean up the step names (lang string) so they no longer reference connector / flow / logic etc as per the comment here: https://github.com/catalyst/moodle-tool_dataflows/pull/503#discussion_r939835415