dthevenin / DataflowAPI

This project aims to define a DataFlow API that can be implemented into a Toolkit such as VS Toolkit. A first implementation is visible at the following site:
http://dthevenin.github.io/DataflowAPI/
1 stars 0 forks source link

Simplify visualization of data flow as a graph #15

Open eric-brechemier opened 10 years ago

eric-brechemier commented 10 years ago

After watching the video in the presentation of noflo on Kickstarter, I thought about the compelling and the not-so compelling arguments in favor of flow-based programming introduced in this presentation.

What convinced me the most was the simple visualization of the data flow and its graphic design referring to the map of London subway.

On the other hand, I found the repeated mention of spaghetti code very weak, because imperative programming has changed since the age of gotos in the 70s, and the graph for the data flow is actually more likely to look like a plate of spaghetti.

Compared with my own experience in day to day programming, data flow programming is no revolution: it is a restriction of what is possible using event publishing/subscription.

But this restriction might be the real value added by data flow programming: with conventions for the propagation of events between components, now limited to a link from an output property of a component to an input property of another, a graph can be drawn to visualize the relationships between components.

When I introduced the events API of lb_js_scalableApp at a ParisJS conference, a concern expressed by one of the developer from Mozilla was how to determine the source of an event: when observing an unexpected event, how to to find where to put the blame, since the event can be published by any component.

The data flow visualization provides a solution to this concern by making the connections between components explicit.

From these thoughts, I propose to:

  1. simplify the data model to keep only 1-to-1 relationships
  2. implement the data flow on top of event pub/sub
  3. remove the requirement of producing only non-cyclic graphs

    Simplify the data model to keep only 1-to-1 relationships

The 1-to-1 relationship can be represented as a simple line from an output to an input.

The multiplexer and demultiplexer and mux/demux relationships do not have such simple representations. These relationships can be implemented with extra nodes connected only with 1-to-1 relationships.

Implement the data flow on top of event pub/sub

This provides a simple mental model for the data flow: a connection from the property "one" or component A to the property "two" of component B is equivalent to a subscription to an event "property-one-of-A-updated" which publishes an event "update-property-two-of-B".

As a consequence, the pub/sub mechanism can then be used directly when extra flexibility is required, for example to implement a connection with transformation, and there is no need to provide explicit support in the higher API of the data flow for these edge cases.

Remove the requirement of producing only non-cyclic graphs

As described in Limitations sections, there is a need for bidirectional connections and other indirect cycles.

The implementation of the data flow should take the responsibility of avoiding infinite loops and stack overflows introduced by the recursion, by detecting introducing a delay when a loop is detected or by yielding systematically before publishing an event inside an event listener:

subscribe( "property-one-of-A-updated", function( value ) {
  // yield
  publish( "update-property-two-of-B", value );
});

@dthevenin What do you think?

dthevenin commented 10 years ago

1) Why not. Actually, 1-1 is most use full API. n-m is a simpler way to express multiple connections between 2 components. But we have to keep the transformation arguments. It-s a real powerful manner to connect two "API incompatible" components.

2) Actually I do not agree with this kind of implementation (that I think is too naive): 1- for performance purpose. Dataflow can be use to express animation, and more generally, data that will change "continuously". If the implementation is based on event pub/sub mechanism, it will generate lot of overhead. 2- dataflow description need to be optimized to avoid too much data affectation and calculation. With a event mechanism it will be impossible to manage that (the graph can not be sort, some branch can not be removed, ...) 3- the propagation algorithm has to know "what he is doing". For instance, sometime a data does not really change or a branch of a graph has to be "cut". For that reason, we have to keep some metadata associated to each node. This can be done "properly" only with a graph based algorithm. 4- using a graph description will allow to compile the dataflow into a real JS code (a futur feature ;) )

3) I agreed, we do not have to manage this problem and let the programmer do it by itself. Within the current implementation, the dataflow is able to give a warning (and do not loop) if a cycle exists. Is enough.

dthevenin commented 10 years ago

My feeling is, I have a good (I hope ;)) implementation of the dataflow core (graph expression, optimization and propagation). The implementation can be optimized a little bit more, but it-s OK.

Now we have to define the good API on top of this implementation. If you think, we need to support event mechanism with dataflow, it will be very easy to add it on the current implementation.

d.