akka / akka

Build highly concurrent, distributed, and resilient message-driven applications on the JVM
https://akka.io
Other
13.03k stars 3.59k forks source link

Restructure, Simplify and Rename types in the new Stream DSL #15977

Closed rkuhn closed 9 years ago

rkuhn commented 9 years ago

This is a specialization of FlowWithSource/Sink in that the type parameter for the closed end is omitted. The purpose is to pass around a transformable description of a data source or sink.

bantonsson commented 9 years ago

Some background:

This change was discussed in #15978, and the naming was settled on here https://github.com/akka/akka/pull/15978#issuecomment-56792542

Source -> Faucet Sink -> Drain FlowWithSource -> Source (with one type param) FlowWithSink -> Sink (with one type param) ProcessorFlow -> Pipe

bantonsson commented 9 years ago

I still think that there is something missing from the bigger model.

In my mind the user should be working with something (for example when using the TCP pipeline) that has an input and an output and that can be materialized. This thing can also be transformed using the normal operations we have, like map et.c. If it under the hood is a Flow or a FlowGraph doesn't really matter to the user. I think that we might be stealing all the good names for the plumbing pieces here. Are the users really interested in the fact that Flow is linear?

jrudolph commented 9 years ago

If it under the hood is a Flow or a FlowGraph doesn't really matter to the user.

It does matter if the user really tries to establish a "real graph" (a graph that isn't a polytree). I haven't seen really intuitive examples of how to specify graphs in text form (is enumerating the edges intuitive?). Because of the cycles it isn't possible to write down a real graph in a closed-form expression. This means that we need some API to deal with real graphs if we want to support them. In these cases, the current API may actually be fine.

However, and this is why I nagged about concat, general graphs are not common. Usually, you have a mostly linear flow containing the occasional broadcast or merge. In many cases the resulting flows are still just trees, in some cases they will be dags with the constraint that there's still only one way from each input end to each output end (a polytree). In all these cases, a closed-form expression is possible and should be expressible and should be the default regardless of whether it is implemented by the more general FlowGraph behind the scenes.

bantonsson commented 9 years ago

When I say that it doesn't matter I mean from the view of the user using this piece of plumbing. If I have a DAG, DG or a single straight transformation (today a Flow) with only one open input and/or one open output, I should as a user be able to treat it as a single piece where I can append, prepend, transform without worrying. What happens under the surface I couldn't care less about.

When I want to build a DAG or DG, then I have to use the real graph API.

jrudolph commented 9 years ago

We may be saying the same thing :)

oseval commented 9 years ago

Between FlowGraph and Flow exists one important difference - Flow preserves ordering of elements from input to output, but FlowGraph can not guarantee this.

jrudolph commented 9 years ago

@smlin Is that defined anywhere? "Preserving order" doesn't seem appropriate as there's no simple relation between input and output elements in a Flow either.

rkuhn commented 9 years ago

After some discussion we propose the following names:

We will add more such pieces, e.g. for modeling BidirectionalStage (to be used for SSL, framing, codecs, etc.) to build a rich DSL on top of the primitive ones. The high-level abstractions will have all the methods to exploit their compositionality, e.g. plugging BidirectionalStages together or executing a Source–Sink combination or a Source–Drain one (which will then return the Drain’s result value if any).

oseval commented 9 years ago

@jrudolph Currently all of Flow operations (exclude merge) preserves order of elements and this is transitional property - hence any Flow preserves order of elements (and I think this is good property).

bantonsson commented 9 years ago

@smlin mapAsyncUnordered #15996 is just about to change that regardless of this discussion.

jrudolph commented 9 years ago

@smlin Try giving a proper (and useful) definition for "preserving order" and you will see that this property doesn't even hold right now. Even if Flow would only contain map-like-operations that have a simple 1-to-1 relationship between input and output elements it wouldn't hold because that would still contain 1-to-1 transformers that would allow arbitrary reordering of elements.

E.g. a very broad definition of "preserving order" could be "The set of output elements that is emitted by a flow triggered by one input element may only depend on this input element and no previous one.". Even this isn't true for most of the operations and doesn't even include an operation that is able to change its state without being triggered by an input element.

oseval commented 9 years ago

Ok, in such case user has two ways: 1) Always thinking about "ordering" of operation - because this property is not set explicitly. 2) Work only with CRDT types (I mean whole stream type).

Sounds not very good.

jrudolph commented 9 years ago

@smlin I think I start to understand what you are saying: the akka-stream implementation of most operations defined in Flow each one taken for itself shouldn't reorder elements. And yes, that makes sense. I still don't think that is somehow related to the set of operations that Flow offers but needs to properly defined and documented for each operation separately. Also, for many operations it is the user who controls (re)ordering, this is basically the same for mapAsyncUnordered as for transform.

drewhk commented 9 years ago

a very broad definition of "preserving order" could be "The set of output elements that is emitted by a flow triggered by one input element may only depend on this input element and no previous one.". Even this isn't true for most of the operations and doesn't even include an operation that is able to change its state without being triggered by an input element.

This is actually I think the proper definition, maybe modified that if elements e1, e2, e3, ... are triggered by element E and elements e1', e2', e3', ... are triggered by element E' then if E comes before E' then all {ex} comes before all {e'x} and the ordering between the emitted elements is also preserved. Of course mapAsyncUnordered can violate this, but otherwise without fan-ins and fan-outs the property holds for all the elements I am aware of (even Transform, since if E triggers nothing, and then E' triggers {E', E} the property still holds, even though this is a "reordering" element).

oseval commented 9 years ago

@drewhk Nice definition!

However for preserve ordering you do not need remove fan-out, only fan-in. And I think this is one of things from which the Reactive Stream Spec denies to subscribe to more then one publisher.