Beuth-Erdelt / DBMS-Benchmarker

DBMS-Benchmarker is a Python-based application-level blackbox benchmark tool for Database Management Systems (DBMS). It connects to a given list of DBMS (via JDBC) and runs a given list of parametrized and randomized (SQL) benchmark queries. Evaluations are available via a Python interface and on an interactive multi-dimensional dashboard.
GNU Affero General Public License v3.0
12 stars 2 forks source link
agplv3 benchmarking dbms jdbc python sql

Maintenance GitHub release PyPI version DOI JOSS dbmsbenchmarker Documentation Status

DBMS-Benchmarker

DBMS-Benchmarker is a Python-based application-level blackbox benchmark tool for Database Management Systems (DBMS). It aims at reproducible measuring and easy evaluation of the performance the user receives even in complex benchmark situations. It connects to a given list of DBMS (via JDBC) and runs a given list of (SQL) benchmark queries. Queries can be parametrized and randomized. Results and evaluations are available via a Python interface and can be inspected with standard Python tools like pandas DataFrames. An interactive visual dashboard assists in multi-dimensional analysis of the results.

See the homepage and the documentation.

If you encounter any issues, please report them to our Github issue tracker.

Key Features

DBMS-Benchmarker

For more informations, see a basic example or take a look in the documentation for a full list of options.

The code uses several Python modules, in particular jaydebeapi for handling DBMS. This module has been tested with Citus Data (Hyperscale), Clickhouse, CockroachDB, Exasol, IBM DB2, MariaDB, MariaDB Columnstore, MemSQL (SingleStore), MonetDB, MySQL, OmniSci (HEAVY.AI), Oracle DB, PostgreSQL, SQL Server, SAP HANA, TimescaleDB, and Vertica.

Installation

Run pip install dbmsbenchmarker to install the package.

You will also need to have

Basic Usage

The following very simple use case runs the query SELECT COUNT(*) FROM test 10 times against one local MySQL installation. As a result we obtain an interactive dashboard to inspect timing aspects.

Configuration

We need to provide

Perform Benchmark

Run the CLI command: dbmsbenchmarker run -e yes -b -f ./config

This is equivalent to python benchmark.py run -e yes -b -f ./config

After benchmarking has been finished we will see a message like

Experiment <code> has been finished

The script has created a result folder in the current directory containing the results. <code> is the name of the folder.

Evaluate Results in Dashboard

Run the command: dbmsdashboard

This will start the evaluation dashboard at localhost:8050. Visit the address in a browser and select the experiment <code>.

Alternatively you may use a Jupyter notebook, see a rendered example.

Limitations

Limitations are:

Other comparable products you might like

Contributing, Bug Reports

If you have any question or found a bug, please report them to our Github issue tracker. In any bug report, please let us know:

We are always looking for people interested in helping with code development, documentation writing, technical administration, and whatever else comes up. If you wish to contribute, please first read the contribution section or visit the documentation.

Benchmarking in a Kubernetes Cloud

This module can serve as the query executor [2] and evaluator [1] for distributed parallel benchmarking experiments in a Kubernetes Cloud, see the orchestrator for more details.

References

If you use DBMSBenchmarker in work contributing to a scientific publication, we kindly ask that you cite our application note [1] and/or [3]:

[1] A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking

Erdelt P.K. (2021) A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking. In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2020. Lecture Notes in Computer Science, vol 12752. Springer, Cham. https://doi.org/10.1007/978-3-030-84924-5_6

[2] Orchestrating DBMS Benchmarking in the Cloud with Kubernetes

Erdelt P.K. (2022) Orchestrating DBMS Benchmarking in the Cloud with Kubernetes. In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2021. Lecture Notes in Computer Science, vol 13169. Springer, Cham. https://doi.org/10.1007/978-3-030-94437-7_6

[3] DBMS-Benchmarker: Benchmark and Evaluate DBMS in Python

Erdelt P.K., Jestel J. (2022). DBMS-Benchmarker: Benchmark and Evaluate DBMS in Python. Journal of Open Source Software, 7(79), 4628 https://doi.org/10.21105/joss.04628