agoragames / kairos

Python module for time series data in Redis and Mongo
BSD 3-Clause "New" or "Revised" License
207 stars 38 forks source link

Aggregating data pre-store #21

Open predakanga opened 11 years ago

predakanga commented 11 years ago

Hey there,

I've recently implemented kairos on one of my websites as a replacement for RRD, storing to four different series for up to 60,000 users every 15 minutes.

So far the performance has been admirable, but the storage required blows up very quickly when you're looking at long periods (mine are simple, but long-term - the longest period retains the data for a full year).

In RRD, this is solved by aggregating the data before it's stored, instead of doing the aggregation at read time.

I'm planning to implement this in my own copy of kairos, but wanted to know whether you guys would be open to having that as a feature, as it may by contrary to your architecture or just plain confusing to users.

To give you an idea of the general approach, I've outlined how this would work for the Redis backend only:

This would be implemented by storing the aggregate average as a hash per period, with two values: the current average, and the number of items that have been stored for that period.

The actual storage would be implemented as a LUA script, to set avg = ((avg*count)+newvalue)/count+1, and count=count+1 - the LUA script would be loaded with "SCRIPT LOAD" when the RedisBackend is constructed and executed using "EVALSHA" in an appropriate _type_set.

awestendorf commented 11 years ago

I do have plans for min, max and average series types which I think addresses your problem. Have you tried the histogram store? I use it in many cases where a list of values is too much.

You can also try mongo storage so that the data can be persisted to disk and not require as much RAM.

predakanga commented 11 years ago

Histograms could do the trick for some of my cases, now that I think about it - there still is a particular case where I do need to use a cumulative moving average.

If you have your own plans, I'm happy to save a bit of effort on my own part and just implement it as a hack for now, not worry about the architecture so much