apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.64k stars 597 forks source link

Support branching in Streamlet API #3016

Open simingweng opened 6 years ago

simingweng commented 6 years ago

Currently, Streamlet API lacks an operator to allow developer to define multiple output streams that can potentially carries data of different user types.

The equivalent can be achieved by using the Spout/Bolt API, more specifically SpoutOutputCollector.emit(java.lang.String streamId, java.util.List<java.lang.Object> tuple, java.lang.Object messageId) or OutputCollector.emit(java.lang.String streamId, Tuple anchor, java.util.List<java.lang.Object> tuple) together with OutputFieldsDeclarer.declareStream(java.lang.String streamId, Fields fields) to emit different data on different streams.

This would be particularly helpful when developer wants to route different types of data to separate downstream operators, each of which is written in a way that it only understands and handles one specific type of data. Basically, the operator enables the use scenario of "one distributor spout/bolt" -> "multiple processing bolts".

The Stream API in Apache Storm 2.0.0-SNAPSHOT has such a branch operator coming.

simingweng commented 6 years ago

@nwangtw @jerrypeng

nwangtw commented 6 years ago

Thanks!