GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.23k stars 303 forks source link

Query result should contains the timeseries information #336

Open killme2008 opened 1 year ago

killme2008 commented 1 year ago

When we query from a metrics table written by prometheus remote write:

select * from prometheus_tsdb_head_min_time order by greptime_timestamp;
+---------------------+---------------------+----------------+------------+
| greptime_timestamp  | greptime_value      | instance       | job        |
+---------------------+---------------------+----------------+------------+
| 2022-10-21 08:51:01 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:51:16 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:51:31 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:51:46 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:01 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:16 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:31 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:46 |       1666340534420 | localhost:9090 | prometheus |

....

We don't known the timeseries information at all( for example instance, job here), so when we want to supports remote read ,we have to find out the timeseries by ourself such as:

let  mut timeseries = HashMap<String, TimeSeries>();

for row in rows {
  let job = ..;
  let instance = ...;

  if let Some(ts) = timeseries.get(format!("{}{}", job, instance)) {
     // append samples
  } else {
     // create a timeseries and insert it into timeseries map.
  }

}

The timeseries protobuf message:

// TimeSeries represents samples and labels for a single time series.
message TimeSeries {
  // For a timeseries to be valid, and for the samples and exemplars
  // to be ingested by the remote system properly, the labels field is required.
  repeated Label labels   = 1;
  repeated Sample samples = 2;
  repeated Exemplar exemplars = 3;
}

It's really ugly and not good for performance. I think we should improve it in future.

tisonkun commented 6 months ago

@killme2008 this seems still relevant. What is our starting point? Read the PromQL remote read code path and investigate how to add the time series info and whether it helps?

killme2008 commented 6 months ago

cc @waynexia

We have a plan to support it, and we already have tsid in the storage format, it can be used to improve the performance.

evenyag commented 6 months ago

For prom remote read API, maybe we can also select the tsid column. However, we already have native PromQL support so the priority of this issue should be low.

it can be used to improve the performance.

We can use it to improve performance and that is what I'm going to do, but it is out of this issue's topic.