G-Research / geras

Geras provides a Thanos Store API for the OpenTSDB HTTP API. This makes it possible to query OpenTSDB via PromQL, through Thanos.
https://github.com/G-Research/geras
Apache License 2.0
38 stars 15 forks source link

Downsample in OpenTSDB #31

Open dgl opened 4 years ago

dgl commented 4 years ago

It may be possible to do some aggregations in OpenTSDB to avoid sending the raw OpenTSDB data back. Not all the aggregations make sense, in particular the Thanos aggrchunks are about downsampling mostly, the labels don't seem to change.

Maybe useful to do #15 first so we can actually measure the impact.

robincw-gr commented 4 years ago

I don't think there's enough detail in the Thanos store api's SeriesRequest to determine what aggregations could help. Thanos needs more work I think

dgl commented 4 years ago

It's very unlikely thanos will change there, the store API fairly closely matches Prometheus's tsdb so it's unclear how work could even be done there. The main thing we can do would be downsampling as often OpenTSDB data has more points than Prometheus data would due to the push rather than scrape model.

robincw-gr commented 4 years ago

If you compare PromQL to resulting SeriesRequest, you'll see that downsampling window size is not transmitted, and functions are mapped to the Aggr value but it seems to be just a stub of an idea and is unused: https://github.com/thanos-io/thanos/blob/master/pkg/store/storepb/rpc.proto#L91 https://github.com/thanos-io/thanos/blob/master/pkg/query/querier.go#L142

robincw-gr commented 4 years ago

A configurable minimum resolution could be used to downsample all queries regardless of what is specified in a user's PromQL.

eswdd commented 4 years ago

So, having done some digging, a maximum resolution window is passed down to the series request when you select a downsample mode other than "Only raw data" on the Thanos query UI (raw data is the default).

If a non-zero value is passed from the query (auto is a valid value) then this is set to a sensible value which would then allow downsampling, which can be combined with the aggregate type to do the right thing.

OpenTSDB 2.4 also supports rollups and pre-aggregates, but we certainly don't run this (it supports storage of such, and retrieval via new APIs, but not the actual calculation thereof) so I don't see much value in supporting at this stage.

dgl commented 4 years ago

We should also investigate setting https://github.com/thanos-io/thanos/blob/master/docs/components/query.md#auto-downsampling on the querier (as the thanos UI has the downsampling controls mentioned, but many queries will come from Grafana...). And document this behaviour...

eswdd commented 4 years ago

Yeah, that's a request-level param. We could either add explicit Thanos support in Grafana datasources, and then configure this at the datasource level (or at least default it in the UI) or default this on at the querier level - but that feels more insidious..

eswdd commented 4 years ago

Hmm, so the Prometheus datasource in Grafana supports custom query parameters (turns out these were added for Thanos - https://github.com/grafana/grafana/issues/16665):

https://grafana.com/docs/grafana/latest/features/datasources/prometheus/

However it would be nice to set on a per-query basis - extending the Prometheus datasource to also support Thanos might be desirable. In fact in the grafana issue above @davkal made a suggestion along the lines I was thinking - might be worth trying to properly implement that.

davkal commented 4 years ago

Re per-query parameters: we recommend using two datasources with different query parameters pointing to the same thanos instead. In the panel editor you then choose the Mixed datasource and can have per-query datasources.

eswdd commented 4 years ago

I think my concern with the current recommendation is that as the set of additional parameters the Thanos Query component supports over Prometheus grows, the potential set of configurations you might want to choose from grows. Given the flexibility of Thanos, we're finding that we're deploying many different Query components depending on the Store API implementations we want to be able to query over, so multiplying the two could get very unwieldy.

This might be best discussed over in a Grafana issue, but I'd be keen to gauge appetite on the LTS extensions for the Prometheus datasource such that we can have a single datasource used in different ways set at query time with appropriate validation.