Netflix / Turbine

SSE Stream Aggregator
Apache License 2.0
835 stars 255 forks source link

describe the agregation algorithm #22

Open pmuellr opened 11 years ago

pmuellr commented 11 years ago

The aggregation algorithm doesn't seem to be documented, except for here:

https://github.com/Netflix/Turbine/wiki/Design-and-Architecture

Under Aggregation dimensions, is the following:

The only binding Turbine has with the data is the aggregation dimension specified using name and type. Json payloads received from individual instances matching the same aggregation key are combined together. e.g

{type:'weather-data-temp', name:'New York', temp:74}
{type:'weather-data-temp', name:'Los Angeles', temp:85}
{type:'weather-data-temp', name:'New York', temp:76}

are combined to give:

{type:'weather-data-temp', name:'Los Angeles', temp:85}
{type:'weather-data-temp', name:'New York', temp:75}

However, running a few experiments, what I actually see is the turbine aggregator summing name/type/time-grouped metrics records. Not averaging. For the number properties. Unclear how it handles strings and booleans, as well as nested object values (I think those numbers were also summed).

It appears Hystrix dashboard then does some averaging based on some of the summed numbers and number of servers reporting. So the view from Hystrix dashboard would align with the example from the wiki, but the actual data flowing between Turbine and the dashboard does not.

It would be awesome if the aggregation algorithm could be described in the wiki.

mnuessler commented 9 years ago

I strongly agree that this information would be very useful.