TPC-Council / HammerDB

HammerDB Database Load Testing and Benchmarking Tool
http://www.hammerdb.com
GNU General Public License v3.0
542 stars 115 forks source link

Add a CH-benCHmark (HTAP) workload #123

Open marcocitus opened 4 years ago

marcocitus commented 4 years ago

We often run HammerDB concurrently with the analytical queries from CH-benCHmark (mainly against PostgreSQL) by using a separate script.

Would there be interest in adding a CH workload directly to HammerDB? This involves running TPC-C transactions and TPC-H-like analytical queries concurrently on the same tables. We may be able to write and contribute a PostgreSQL implementation.

CH-benCHmark is also supported by some other benchmarking tools, e.g.: https://github.com/oltpbenchmark/oltpbench

abondvt89 commented 4 years ago

This is an interesting idea regarding adding it to HammerDB as a combo workload option. If you have an implementation that you would like to contribute for PostgreSQL that is not license restricted and meets the TPC Fair use guidelines (http://www.tpc.org/tpc_documents_current_versions/pdf/tpc_fair_use_quick_reference_v1.0.0.pdf) go ahead and submit a PR.

sm-shaw commented 4 years ago

As discussed this is a great suggestion for inclusion in HammerDB. Before submitting a PR for PostgreSQL it would be good to discuss the design approach to make sure that it is as straightforward to implement as possible for multiple databases. If the CH workload is sufficiently different from both C and H it would be best to add it as a separate workload. This would involve adding additional parameters to the XML, options to the opt files eg pgopt.tcl and another workload releated script for build and driver such as pghtap.tcl. To do so would mean more generic work underneath but could pay off in the longer run. If it is very close to either C or H then it could be added as a checkbox option. With a checkbox option the approach is to try and keep the driver scripts as consistent as possible over time, therefore when we have changes such as 'use all warehouses' or 'time profile' they are added by inline editing of the driver script. (There are 3 templates test, timed and asynchronous timed) As these checkbox options increase it is important to ensure that they are all compatible in different combinations. The multiple labelled connections https://github.com/TPC-Council/HammerDB/issues/101 also takes this editing approach. Therefore the first task is to determine how sufficiently different this workload is from C & H and whether a separate workload is best not to confuse people running HammerDB for the first time.

sm-shaw commented 5 days ago

Adding link to discussion topic Vector Search Benchmarking along with OLTP There is potential for this to be developed a TPC-CHV workload for databases that support it.