gexp can only operate on time series that have exactly matching tag key/value sets

gnydick commented 5 years ago

On the surface, this makes sense, however, there are times that the tags are out of our control and we need to be able to say either "exclude these tags" or "include these tags".

An example, we have a distributed network of stats gatherers that record counts and sizes of things. If we want to divideSeries(sum:sizes{thing=widget},count{thing=widget}) we should be able to. But, since the underlying data has tags metric_type=histogram and metric_type=counter, respectively, it fails to align the series to do the division.

You could also say to just not report the metric_type (easier said than done), but there's also the problem of have a host tag in order to prevent overwriting values if more than one collector reports the same exact fq-metric at the same time. So we use the host tag to make sure they are aggregated.

I would have hoped the default behavior was just like the standard query behavior where the aggregate function operates and the unused tags aren't considered anymore.

I'm guessing the gexp pipeline isn't using the standard pipeline for aggregation and interpolation. If it were, the divide, scale, multiply, etc. would have been easy-ish and foolproof-ish to implement because everything would align and all you have to do is iterate over the lists and operate on the operands. But that's just a theory. I'm not versed on the internals, so maybe it's not as easy-ish as I'm thinking.

gnydick commented 5 years ago

I actually just looked at the source code and I see why it is this way. The gexp pipeline is using the exp pipeline which has flags for how to deal with tags, but there is no option to pass in those flags on the gexp query pipeline. Ultimately, it would excellent if we could add those two flags to gexp that exp has.

intersect_on_query_tagks include_agg_tags

gnydick commented 5 years ago

Suggestion...

After reading the documentation, I see this is exactly the desired behavior, but I think the desired behavior is not desirable.

I would want the aggregation/interpolation to happen just like in any other query and then provide the ability to split on tag after the expression is performed.

For example

divideSeries(sum:widet_payload_size{app=foo,comp=bar,pod_name=*},sum:widget_payload_count{app=foo,comp=bar,pod_name=*}){client_id=*}

the {client_id=*} is the "group by" after the expression has been processed. This would be awesome.

manolama commented 5 years ago

Hi, yes that is a drawback of the GEXP endpoint. It partially follows the existing query logic but diverges when it comes to the expression bits.

The 3.0 API is fully flexible and supports what you're after: file:///Users/clarsen/Documents/opentsdb/opentsdb_web/docs/3x/build/html/api_http/query/graph.html though it's an ugly, bare-bones semantic API. If folks are still interested in the Graphite style functions we can port that over later (just need an Antlr grammar for it). Otherwise we'll aim to support other DSLs like PromQL and SQL.

OpenTSDB / opentsdb

gexp can only operate on time series that have exactly matching tag key/value sets #1426