Flowmix is a flexible event processing engine for Apache Storm. It supports complex correlations of events via sliding/tumbling windows. It allows parallel streams to be processed that can be grouped together in different ways.
Apache License 2.0
55
stars
20
forks
source link
Light Aggregator (not keeping the whole field group) #45
Right now the aggregators keep the field group and do the math on the fly, I've seen the following approach in Tibco Streambase to process aggregation:
1- the tuple (as group of attributes/columns, basically a map) gets into the aggregator and only the aggregator input fields are considered and the rest is forgotten (the rest of the tuple)
2- the value gets processed into simple math: add, subtract. This is in a deconstructed manner, ej. avg is just one Long/BigInteger count and one Long/BigInteger sum. The actual field gets forgotten at this time.
3- when the aggregation window closes (emits, etc.) the 'heavy' math is done: multiplication, division, etc.
I've seen millions of tuples get processed into this type of aggregator with very low CPU and memory consumption and attack ships on fire off the shoulder of Orion
Right now the aggregators keep the field group and do the math on the fly, I've seen the following approach in Tibco Streambase to process aggregation: 1- the tuple (as group of attributes/columns, basically a map) gets into the aggregator and only the aggregator input fields are considered and the rest is forgotten (the rest of the tuple) 2- the value gets processed into simple math: add, subtract. This is in a deconstructed manner, ej. avg is just one Long/BigInteger count and one Long/BigInteger sum. The actual field gets forgotten at this time. 3- when the aggregation window closes (emits, etc.) the 'heavy' math is done: multiplication, division, etc.
I've seen millions of tuples get processed into this type of aggregator with very low CPU and memory consumption and attack ships on fire off the shoulder of Orion