Implement time-series compression in Facebook Gorilla paper

manolama commented 7 years ago

There may be a few use-cases where we can do this and would be nice to have. We've also met with the Gorilla engineers and discussed how we can modify Beringei (The name of the OSS'd Gorilla, https://github.com/facebookincubator/beringei) to support tags and use it as a write-through cache for TSDB.

tesseract2048 commented 7 years ago

Hi, thanks for reply, @manolama. Is there any concrete plan to support tags in Beringei?

Also, I have implemented an experimental version that converts coding to gorilla during compaction. Since the coding itself does not imply value type, I was using 1 bit for every datapoint to indicate whether the value is float or integer (hence all floats are double), then store the representation in qualifier with RLE. Observed 4x size reduction (consider only qualifier + value) from this. However, if gzip/snappy compression is further enabled, the benefit can be small (nearly 40% size reduction). So maybe this approach won't help much as persistent storage. But things can be really different for write-through cache.

manolama commented 7 years ago

Regarding tags in Beringei, not yet. They said that it shouldn't be too difficult to override and support tags but I haven't dug into the code yet.

And yeah, their encoding loses accuracy at the expense of storage so I'd like to be able to maintain accuracy in TSD whenever possible.

mansu commented 7 years ago

@manolama I am a software engineer on the visibility team at Pinterest. We use openTSDB as a primary store for our metrics. Like Facebook, we are planning to serve the recent metrics data from memory. So, we are implementing a cache in-front of OpenTSDB for recent(1 day) data. So far, I have implemented an in-memory storage engine with gorilla encoding and tag support using an inverted index in Java. The storage engine is completed now and can serve data via a Java lib/thrift API. The engine gives 10x compression for the data and has sub-second query latency. Currently, I am looking into adding an OpenTSDB query API on top of our cache so our dashboards don't have to be migrated. If there is interest, I am interested in collaborating with the OpenTSDB community on this feature. Please let me know if there is enough interest in this feature.

johann8384 commented 7 years ago

For a queries, Splicer can be used to break queries into blocks and submit them in parallel to OpenTSDB which greatly improves performance and it also caches the blocks in Redis which makes subsequent queries significantly faster. Queries that took minutes became milliseconds even for subsequent time blocks since it could add just the new time blocks.

johann8384 commented 7 years ago

And of course, your solution sounds awesome and I would be very interested in helping to implement a OpenTSDB plug in which utilitizes it!

mansu commented 7 years ago

@johann8384 Thanks for the response.

I was not aware of splicer before. So, I looked at it today. If I can add an OpenTSDB interface on metron using the plugin, then I use splicer to query multiple instances of that metron, it would fit our model very well assuming splicer is a drop-in replacement for OpenTSDB. I looked at the splicer source and OpenTSDB documentation today and have a few questions.

1) Is splicer a drop in replacement for OpenTSDB interface? 2) To implement metron as an OpenTSDB plugin, is there a sample plugin I can refer to? I looked at the source code and plugins in OpenTSDB, however, I couldn't find an interface that separates the query layer from the storage later? Or is it TSDQuery?

johann8384 commented 7 years ago

No, there is not an interface intended for this purpose today. That is what I'd like to collaborate to implement. I can help get the interface implemented and added to OpenTSDB and also work with you to allow your tool to function as a plugin for that interface.

It is my thought that the ability to modify or intercept the query and/or the results would allow people to do things like implement multi-tenancy (combined with the authentication plugins), it would also allow people to cache retrieved data in a way that makes sense for them. For example, implementing splicer as a plugin so that it could hit the Redis cache and only hit the backend for the missing blocks and then cache the blocks on the way back. It would be very transparent to the end user.

Yes, Splicer is a drop-in replacement, but there are a few corners of the API which are not handled properly, I believe there are a few open issues on those in that project.

Splicer does two main things, shards the queries into multiple queries, sending the requests to the TSD instances which are co-located on the nodes that have the region, and also caches the responses for future queries. This makes things like Grafana dashboards with dozens of queries that would return 2 GB of data from HBase much, much faster.

We wrote it because we had queries where we would aggregate away a tag with 1000 servers in it, and 1000 servers for 7 days reporting a value every 15 seconds really adds up fast. So we would see 2GB of data being read from HBase, sent over the network and then processed by OpenTSDB, then the downsampled/aggregated result (20KB) would be returned. This took 30 seconds to do the network part, and 30 seconds to do that downsample/aggregate. Some queries went from 10 minutes to 1 minute with the locality, and then from 1 minute to 40ms with the caching, even if it had to get a chunk from Hbase.

@wicknicks and I have agreed that if that capability can be rolled into an OpenTSDB plugin it makes sense to do that.

mansu commented 7 years ago

Thanks for a detailed explanation. The interface to intercept queries and responses sounds great. I agree that It would make it easier to implement functionality like splicer or metron as a plugin. This will also make it easy to implement federation in OpenTSDB layer. Please let me know how I can help.

Also agree that caching the result of the dashboard is a great idea and reduces latency. We cache some long queries at Pinterest to improve latency.

burmanm commented 7 years ago

Commenting on the original request, in Hawkular project we did add the ability of compressing on disk storage with the gorilla compression method also (with millisecond precision). The simplest way of doing this was to write storage points like we did before and then in the "compaction phase" combine larger amount of data to a single bucket that's stored on the disk. This follows a very common hybrid-storage pattern in columnar databases.

I don't have any benchmarks here to paste, but the reduction in storage requirements were significant and it should improve the read performance also (we store in Cassandra, but the same I think applies to HBase also) as there's not just less I/O requirements to load the datapoints but also reduced amount of rows loaded which reduces overhead from the HBase side also. The latter made large difference in query performance in Cassandra, but of course its storage design is slightly different.

Our implementation of the gorilla compression (Java) is available from https://github.com/burmanm/gorilla-tsc with Apache v2 license so there's no need to recreate this part. Both doubles as well as longs can be stored using this library and the metadata stored elsewhere should tell what has been stored inside the compressed "blob". We used a one byte header as the first byte in the compressed blob to write compression type etc information.

hellozeck commented 6 years ago

so, any process ?

OpenTSDB / opentsdb

Implement time-series compression in Facebook Gorilla paper #921