We are aiming for designing and implementing a benchmark to evaluate the performance of analysis of time series databases.

Here is my understanding for designing for analytical workload

Dataset

In TSBS we have Metrics data from DevOps or IoT devices(e.g. CPU/memory utilization), there are still other kinds of time series data we can take into consideration:

Event: e.g. User log in/out on websites, IoT device turn on/off. The event data typically consists of timestamps and event type.
Logs: e.g. Log context from server/application/database/network devices. The log data typically consists of timestamps, log level, log content and backtrace

How to generate dataset is one of the main concern.

Deep learning(which I am not familiar with): refer to TSM-bench, it seems like more complicated but the generated data may be more
Statistical method: In some papers, they use Hidden Markov Chain to generate data or extract data pattern and generate more data with that pattern. In TPCDS, they use synthetic datasets that are built using well studied distributions such as the Normal or the Poisson distributions. They are mathematically very well defined and easy to implement in a data generator.

We should have a option to control

Data type: Metrics, Event, Logs
Scaling factor to control data size. Both domain and tuple should be scaled(refer to TPCDS)

Query

I collect some scenarios which involve the analysis performance in time series database

Data fetching: This is a very basic function, like select data by time range with some filter and aggregation
Anomaly detection: Detect the existence of abnormal value. This may involves downsampling
Prediction: Maybe it involves sliding window. Some of TSDB may have customized prediction functions or user-defined prediction function.
Trending: downsampling
Value filling: upsampling

Pseudo code for some queries:

Data fetching

SELECT time, id 
FROM t 
WHERE time > ts_start 
AND time < ts_stop
AND a > value

2.Aggregation and Join

SELECT time, id, AVG(a), SUM(b)
FROM t 
WHERE time > ts_start 
AND time < ts_stop

SELECT time, id, AVG(t1.a), SUM(t2.b)
FROM t1 JOIN t2 ON t1.a = t2.b
WHERE time > ts_start 
AND time < ts_stop

Downsampling

SELECT time, id, AVG(a), SUM(b)
FROM t 
WHERE time > ts_start 
AND time < ts_stop
GROUP BY id, time
SAMPLE BY 1H

Upsampling

SELECT time, id 
FROM t 
WHERE time > ts_start 
AND time < ts_stop
SAMPLE BY 10s
FILL(LINER)

Test suite

Like TPCDS we can have the following tests:

Loading Test: evaluate the time to import raw data to DB(single/multi thread)
Power Test: Single thread performance
Throughput Test: Multi-thread performance Maybe we should add some tests for ETL performance evaluation

Outputs

This is not a point that can be designed at the very beginning, in a nutshell benchmark is a tool to test the performance, so the most important thing is the time it takes to execute the query, when we have the import, single threaded, and multithreaded execution time we can figure out the metrics like THROUGHPUT and PRICE OVER PERFORMANCE.

reference:

TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications：https://www.vldb.org/pvldb/vol16/p3363-khelifati.pdf https://www.odbms.org/2023/12/on-benchmarking-time-series-database-systems-for-monitoring-applications-qa-with-abdelouahab-khelifati-and-mourad/
SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things：https://arxiv.org/pdf/2204.09795
YCSB-TS: https://github.com/TSDBBench/YCSB-TS
TS-Benchmark: A Benchmark for Time Series Databases：https://www.benchcouncil.org/bench2018/chenyueguo.pdf
The Making of TPC-DS: https://www.tpc.org/tpcds/presentations/the_making_of_tpcds.pdf

GreptimeTeam / greptime-bench

Discussion: Benchmark for time series analytical databases #2

Here is my understanding for designing for analytical workload

Dataset

Query

Test suite

Outputs