Netflix / atlas

In-memory dimensional time series database.
Apache License 2.0
3.44k stars 304 forks source link

Feature request: KairosDB backend for Atlas #20

Closed lcoulet closed 6 years ago

lcoulet commented 9 years ago

It looks to me that Atlas with kairosDB as a datastore backend would be a winning team. KairosDB has most of OpenTSDB features while being more flexible under many aspects. Moreover, netflix is already known as a large contributor to Cassandra world, and Cassandra is by design the primary datastore for the KairosDB time series database with a very efficient design for time series data.

burtonator commented 9 years ago

I was unable to comment on the original post. It looks like Atlas uses OpenTSDB as a backend.

Since Netflix uses Cassandra heavily we thought they would benefit from integrating KairosDB

brharrington commented 9 years ago

We don't currently use OpenTSDB, but might consider it in the future. Sorry if that was unclear in the post. KairosDB is also an option we can look at, but I can't give a timeline right now. The main data store we use keeps all of the data in memory to support fast aggregations.

For some context, when we were looking at options several years ago we did quite a bit of experimentation with Cassandra, but found it wasn't a good fit for our use-cases. Most queries involve aggregation of many time series as dashboards and alerts are typically looking at summary views at a cluster level. With high dimensionality some queries require fetching millions of time series to compute the aggregate. With most of the options we looked at in order to get acceptable performance precomputed aggregates would need to be created. Keeping all data in memory allowed us to get the performance while allowing for arbitrary slicing and dicing based on the tags. To keep cost in check we do perform rollups on historical data to reduce the volume and the retention for data in memory is a few weeks (see cost section of the overview). The full data is kept in S3 so it can be accessed if needed, but it will take a lot longer.

The other aspect is querying the metadata associated with the time series. For backwards compatibility reasons internally we needed something that could handle regular expressions well along with the other tag query operators we needed (and, or, not, eq, etc). This was also an area where we had considerable trouble finding a data store that could handle what we needed. The internal predecessor went through a number of data stores from postgres, mongodb, to some lucene based options. In addition to the query flexibility we needed to support a large write volume to update the metadata with a desire to keep the indexing time small so new data would show up as quickly as possible. The red/black deployment model Netflix uses is pretty hard on the monitoring system in this respect because it can mean a complete copy of a big cluster gets spun up and within minutes we can have 100M new metrics coming in. Plus during a deployment is when users are most interested in their data so it is a bad time to indexing delays and other problems. Also, trying to keep the metadata store in sync with Cassandra was a bit of a pain.

Of course most of the systems that we tried have come a long way since then and many new options have cropped up. If I can find the time I would like to try several of them with our internal data set and see how it goes.

burtonator commented 9 years ago

With high dimensionality some queries require fetching millions of time series to compute the aggregate.

We have the same issue at Spinn3r... TONS of stats and we need some aggregates. We use KairosDB and improve performance by denormalizing data. So we usually have one metric with say 5 tags but if one of them is important, we create one main denormalized metric for that one. This way it only has to read one row.

Of course most of the systems that we tried have come a long way since then and many new options have cropped up. If I can find the time I would like to try several of them with our internal data set and see how it goes.

When you get some time, put KairosDB up there.. I think you will be happy with the results.

We like it and have our entire cluster of about 40 machines monitored by KairosDB ... I think we index 100GB or so of stats per week. Might be more.

brharrington commented 9 years ago

So we usually have one metric with say 5 tags but if one of them is important, we create one main denormalized metric for that one. This way it only has to read one row.

That is what I meant by creating precomputed aggregates. The problem we had with doing that was identifying what is important. We could only really get it to work by being very careful with the data being sent in and planning out the common query patterns. It just always seemed like we would be in the middle of a production issue and some other combination would be useful.

When you get some time, put KairosDB up there.. I think you will be happy with the results.

Will do.

I think we index 100GB or so of stats per week.

For one of our main regions the compressed sequence files in S3 add up to about 1TB per day.

That said raw sizes can be ambiguous, the data characteristics really matter for the performance. For example for us 1M time series with a relatively small set of distinct tags just in different permutations is typically much better than 1M with mostly distinct string names. Unfortunately for us some of the internal legacy encourages poorly modeled data.

burtonator commented 9 years ago

It just always seemed like we would be in the middle of a production issue and some other combination would be useful.

Yes. Thats definitely true. Keeping it in memory is kind of important.

I had thought that it might be ideal to automatically determine what denormalizations are needed by building a tree map of the metrics +tags that are used but of course that's a big project.

It also doesn't always solve the issue of figuring out what's needed in an urgent situation.

For one of our main regions the compressed sequence files in S3 add up to about 1TB per day.

Oh. Just wanted to clarify that I wasn't implying that this was a limit of KairosDB ... just that we're doing 100GB per week without any problems. I'm sure it would scale up to that level if you give it the right amount of hardware.

matschaffer commented 9 years ago

As an anecdotal point: When working on prod issues using Atlas with the Netflix Crisis Response team, I often found the "key" metric for a problem was one we were recording but not actively watching.

If we had been the issue would probably have never passed canary testing in the first place.

So keeping metrics very flexible has been a huge plus.

On Sunday, December 14, 2014, Kevin Burton notifications@github.com wrote:

It just always seemed like we would be in the middle of a production issue and some other combination would be useful.

Yes. Thats definitely true. Keeping it in memory is kind of important.

I had thought that it might be ideal to automatically determine what denormalizations are needed by building a tree map of the metrics +tags that are used but of course that's a big project.

It also doesn't always solve the issue of figuring out what's needed in an urgent situation.

For one of our main regions the compressed sequence files in S3 add up to about 1TB per day.

Oh. Just wanted to clarify that I wasn't implying that this was a limit of KairosDB ... just that we're doing 100GB per week without any problems. I'm sure it would scale up to that level if you give it the right amount of hardware.

— Reply to this email directly or view it on GitHub https://github.com/Netflix/atlas/issues/20#issuecomment-66930809.

-Mat

about.me/matschaffer

burtonator commented 9 years ago

As an anecdotal point: When working on prod issues using Atlas with the Netflix Crisis Response team, I often found the "key" metric for a problem was one we were recording but not actively watching.

yes. Same here. We try to take measurements on anything that seems like it will be important at some future point in time. This way if an issue comes up we can resolve it by looking at historical data.

lcoulet commented 9 years ago

Thanks for your answer.

What you did with Atlas is just awesome, and the query API is impressive (even if the stack language is not user friendly for users accustomed to SQL or similar).

I find good complementarity between Atlas and KairosDB/OpenTSDB approach (kairosDB having my preference). I don't know whether any of them would stand raw data at the rate of 1TB per day... In 2013 I used to stress KairosDB at 3B samples per day on a co-hosted single server hosting KairosDB and Cassandra (both running on a single node), and it stood the charge pretty well during many weeks, but we're in volume below 40GB per day when serialized on disk by Cassandra... 1TB/day is more than one order of magnitude bigger. How long do you keep such a large volume of data?

I believe regexps and tag query could be implemented (and, or, not, eq...) but there's no point in changing Atlas in-memory model to a time series database, having all the recent data for the past hours in memory is just fine.

For smaller users than Netflix, such a powerful tool like Atlas would definitely benefit from a time series database like KairosDB or even raw in cassandra for backend storage (I'd expect faster and better queries compared to access to the raw data in S3).

Moreover, as you are doing rollups, I believe KairosDB would be a nice solution for storing the rollups or automatically computed statistical features on the series.

That's a great project that you shared to the community. I will follow it carefully. Thanks !

burtonator commented 9 years ago

After listening to the talks this was my thought as well.

In the talk you mentioned that you don't need much older/legacy data as most developers don't think past 1-2 weeks.

The memory models is really flexible for 1-2 weeks of data but at that rate for a years worth of data I would think it would be very painful.

Maybe OpenTSDB/KairosDB could be used in that situation.

dmuino commented 9 years ago

At Netflix we use atlas for operational insights: near real time graphing of time series and alerting. For this we keep the complete dataset generated by our nodes in memory for the first 6h. After that we start applying rollups based on configurable policies (dropping certain tags, like the node tag) to reduce the metrics volume, and as mentioned in the blog post we keep only about 2 weeks of data in memory.

For things that require more history and there's some value in generating graphs that look at the past few months we have a very specific whitelist. If you want to explore arbitrary old data we make the whole dataset available using hive. Obviously queries against hive take significantly longer (at least minutes) than hitting our memory tier, but it's a very helpful option to have available.

gnydick commented 8 years ago

Just wanted to add my $0.02. Given this thread is a couple years old, I've been using OpenTSDB for a couple years and it's been fantastic and very high performance under high volume. Happy to talk to anyone interested.

brharrington commented 6 years ago

We'll revisit in the future, but for now there are no current plans and we do not have the bandwidth to work on it. Closing the issue to help with some of our tracking.