OpenTSDB / opentsdb

A scalable, distributed Time Series Database.
http://opentsdb.net
GNU Lesser General Public License v2.1
5k stars 1.25k forks source link

Max value along with timestamp retrieval question #810

Open ace-han opened 8 years ago

ace-han commented 8 years ago

Say, I have below time series for metric cpu.load_1min

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9
0 10 90 20 80 30 70 40 60 50

I wanna extract the highest value along with its timestamp. in above case it's <t2, 90>. I also want the timestamp to be the exact timestamp without downsampling.

And I'm talking about generating a 3-month timespan report for over 10 thousand servers. So querying them all back and then do the max operation myself is really not my option.

I found that /api/query or /api/query/exp could not do it

Please help me out, thx.

johann8384 commented 8 years ago

Well, if you do the gexp you can do max(${query}) and get the max value, I think we implemented Max, if not, adding it would not be terribly hard.

In either case, you're still going to be querying three months of data and calculating the max, it will involve a little less network traffic to do it server side.

For doing long queries, I recommend using Splicer (github.com/turn/splicer) it will shard your query into 1 hour blocks and run them in parallel against the cluster of TSD nodes. I use it with 10x Docker containers running RO mode OpenTSDB on each HDFS Datanode/HBase RegionServer. Splicer also sends the queries to the node which hosts the region which also reduces the network traffic. Once it has retrieved the data it will cache the 1 hour blocks in Redis. This greatly improves the amount of time required to run long queries in cases where most of the data was previously queried.

ace-han commented 8 years ago

@johann8384 for the gexp you referred to, I kinda not getting it work. In the doc, I found below two endpoints highestMax and highestCurrent, and sadly their behaviors are like below

http://opentsdb/api/query/gexp?start=3n-ago&exp=highestMax(max:network.pkts_in{host=s1|s2|s3}, 2) result looks image

and detail in one of the array image

The return result is about the top n time series with full list of values including the max value respectively, not a simple array with <max_ts, max_value>.

It's not sth more like what I expected in below format (not with a bunch of values, just the max, min or avg value during that specific period)

[
{"host": "s1", "max_value_ts": "tsX", "max_value": "max_value_s1"},
{"host": "s2", "max_value_ts": "tsY", "max_value": "max_value_s2"},
]

Or am I missing anything on the doc? Or just the doc just lack of examples of how to use gexp.

Looking forward to your help

opsun commented 7 years ago

https://github.com/OpenTSDB/opentsdb/pull/1048

I have been fixed this issue.

IDerr commented 7 years ago

Hi @ace-han if your question has been answered can you close this issue please ?

Thanks

mancubus77 commented 6 years ago

Facing same issue on 2.3 Instead of MAX value, TSDB return set of datapoints. http://opentsdb/api/query/gexp?start=4h-ago&exp=highestMax(max:if.percent.peak.stat\{tid=xx,dir=*,target=*\},2)

    "dps": {
      "1516852800": 17.200000762939453,
      "1516853700": 18.350000381469727,
      "1516854600": 18.5,
      "1516855500": 19.31999969482422,
      "1516856400": 17.479999542236328,
      "1516857300": 16.450000762939453,
      "1516860900": 22.56999969482422,
      "1516861800": 22.8799991607666
    }