Beuth-Erdelt / DBMS-Benchmarker

DBMS-Benchmarker is a Python-based application-level blackbox benchmark tool for Database Management Systems (DBMS). It connects to a given list of DBMS (via JDBC) and runs a given list of parametrized and randomized (SQL) benchmark queries. Evaluations are available via a Python interface and on an interactive multi-dimensional dashboard.
GNU Affero General Public License v3.0
13 stars 2 forks source link

Measuring database price performance e.g. QphH #121

Closed deenar closed 2 months ago

deenar commented 1 year ago

This is an information request rather than an issue. Background - I am comparing various cloud database options for a given usecase/workload. With cloud databases, one can typically choose capacity units and price-performance is an important evaluation criteria. The goal is to find out how much it does each option costs per x users or throughput (or no of concurrent users supported) per $/Eur.

Average workload is defined as a typical set of queries run per hour by a user, with different nbrRuns assigned to each query depending on how often each one is run. If you take TPCH as an example, the 22 queries each would have a different nbrRun against them.

There are 2 kinds of measurements I would like to perform 1) absolute performance, so given one user how quickly the benchmark is executed. This is straightforward using numProcesses': 1, 2) price-performance ratio or no of concurrent users. Ideally, I would like to measure QphH. How do I ensure that the database is saturated? Do I just keep on increasing numProcesses, till the results plateau? I saw a metric throughput in the notebooks. How is this defined

perdelt commented 1 year ago

Thanks for sharing the background information! I think this is a very interesting use case. Apparently you want to perform the TPC-H power test and the TPC-H throughput test. This is possible. I for example receive the following results for a local MySQL instance at the throughput test:

1 processes
DBMSBenchmarker duration: 193 [s]
Throughput: 1*22*3600/193=410.3626943005181 Qph

2 processes
DBMSBenchmarker duration: 215 [s]
Throughput: 2*22*3600/215=736.7441860465116 Qph

3 processes
DBMSBenchmarker duration: 254 [s]
Throughput: 3*22*3600/254=935.4330708661417 Qph

4 processes
DBMSBenchmarker duration: 302 [s]
Throughput: 4*22*3600/302=1049.0066225165563 Qph

5 processes
DBMSBenchmarker duration: 368 [s]
Throughput: 5*22*3600/368=1076.0869565217392 Qph

10 processes
DBMSBenchmarker duration: 687 [s]
Throughput: 10*22*3600/687=1152.8384279475983 Qph

15 processes
DBMSBenchmarker duration: 955 [s]
Throughput: 15*22*3600/955=1243.9790575916231 Qph

20 processes
DBMSBenchmarker duration: 1249 [s]
Throughput: 20*22*3600/1249=1268.214571657326 Qph

25 processes
DBMSBenchmarker duration: 1594 [s]
Throughput: 25*22*3600/1594=1242.1580928481808 Qph

So it plateaus at around 1250 Qph.

Note however that DBMSBenchmarker synchs subprocesses after each query and reconnects to the DBMS. I will add the following to the docs:

To circumvent Python's GIL, the module multiprocessing is used. For each combination connection/query, a pool of asynchronous subprocesses is spawned. The subprocesses connect to the DBMS and send the query. Note this implies a reconnection and the creation of a JVM. When all subprocesses are finished, results are joined and dbmsbenchmarker may proceed to the next query. This helps in evaluating concurrency on a query level. You can for example compare performance of 15 clients running TPC-H Q8 at the same time. If you want to evaluate concurrency on stream level with a single connection per client, you should start several dbmsbenchmarker.

Throughput in the notebooks is computed on a query level. This probably is not what you want here.

If this sounds like a suitable setting, I will provide you more information about how to run such a test.

deenar commented 1 year ago

Thank you. That is exactly what I was doing before using DBMSBenchmarker. It would be a useful feature if a) Qph/$ reporting was supported natively by the DBMSBenchmarker. It already captures the cost. b) Database through-put measurement was an option viz. increasing max processes, till a certain error threshold and/or throughput plateauing out c) Rather than each combination of connection/query running individually an additional option which would run the entire set of queries as a single workload per database. This reflects real-world workloads better and would highlight any query/result cache limits as compared to the current approach.. so for each connection, you would spawn a subprocess per numProcess defined (that would reflect the concurrent users) and each subprocess would run all the queries specified, with some queries repeated based on numRun specified for each query. The list of queries and numRun per each query defines a typical user interaction.

P.S. hopefully one day i can contribute too.

perdelt commented 1 year ago

Thank you for the suggestions! Yes, I think b) and c) are interesting directions. In particular b) seems to be promising. As DBMSBenchmarker is a Python module, you might want to try to program a wrapper, that starts new instances as long as results are improving? You do not need to change the package for that. At a), DBMSBenchmarker is reporting results so you can compute throughput (depending on the exact definition). Something like maximum runtime per query, summation over complete stream Q1-Q22 is total runtime t, and then the formular 22*number_of_streams*3600/t?

deenar commented 1 year ago

Hi @perdelt

If this sounds like a suitable setting, I will provide you more information about how to run such a test.

I never followed up. Can you give me details on what steps does one have to follow to get the following info. How does one get the DBMSBenchmarker duration?

25 processes DBMSBenchmarker duration: 1594 [s] Throughput: 25223600/1594=1242.1580928481808 Qph

perdelt commented 1 year ago

It is necessary to change the head of the queries.config (comment out three lines):

{
        'name': "The TPC-H Queries",
        'intro': "This includes the reading queries of TPC-H.",
        'factor': 'mean',
        'connectionmanagement': {
                'timeout': 1200,
                #'numProcesses': 1,
                #'runsPerConnection': 0,
                #'singleConnection': True
        },

The default in connectionmanagement is set to 'singleConnection': True. This forces dbmsbenchmarker to hold a single connection for the complete stream, like in the TPC-H power test, and this contradicts spanning a pool of connections for parallel execution. This behaviour might change in future, but currently parallel execution can only be achieved this way. This also implies, that we have a pool of connections per query. The benefit is, for example all instances of Q8 are run at the same time, no matter what happend during Q1-Q7.

You run a benchmark (for example 4 streams) by

dbmsbenchmarker run -f tpc-h/ -p 4 -pn 4 -e yes

if the folder tpc-h/ holds the (changed) config files.

The CLI tool prints out something like

DBMSBenchmarker duration: 299 [s]

so this gives a number like noted above.

You can get more information in a notebook.

df = evaluate.get_aggregated_query_statistics(type='timer', name='execution', query_aggregate='Max').T

computes the maximum execution time only (ignoring all other overhead) of the 4 parallel executions per query in seconds.

time [ms]
Q1 35402.3
Q2 796.01
Q3 17458
Q4 2904.54
Q5 13920.6
Q6 5684.1
Q7 6931.27
Q8 30039.4
Q9 30096
Q10 5839.7
Q11 2511.15
Q12 8401.92
Q13 6676.57
Q14 9063.6
Q15 7196.78
Q16 1712.49
Q17 5460.03
Q18 7723.76
Q19 689.364
Q20 2547.22
Q21 54470.7
Q22 645.077

This

df = evaluate.get_aggregated_experiment_statistics(type='timer', name='execution', query_aggregate='Max', total_aggregate='Sum').T/1000
4*22*3600/df

sums these latencies and computes the throughput Qph. This is what comes closest to the TPC-H throughput metric.

Qph
1236.68
deenar commented 1 year ago

Is my understanding correct?

I want to even out the startup costs, by running the queries several times per connection. Would the following would run all the queries 8 times using the pool of 4 connections, usually 2 queries per connection? dbmsbenchmarker run -f tpc-h/ -p 4 -pn 8 -e yes

I would have to modify this to get the correct Qph, somehow do a max per connection, or use something like

df = evaluate.get_aggregated_experiment_statistics(type='timer', name='execution', query_aggregate='Sum', total_aggregate='Sum').T/(81000) 822*3600/df

perdelt commented 1 year ago

Hi @deenar, yes that is correct. It runs 8 executions per query from 4 connections, numbered as execution 0 through 7, where the following is fixed deterministically:

Runs 0,2,4,6 are the first runs per connection Runs 1,3,5,7 are the second runs per connection.

You can clean that up on your own with pandas after benchmarking is complete (there is no implemented function for your use case):

Per query:

df1, df2 = evaluate.get_measures_and_statistics(numQuery, type='timer', name='execution')
df1
DBMS 0 1 2 3 4 5 6 7
MySQL 36243.3 24938.1 36138 24937.1 36159.1 24936.5 36138.7 24934.5

and

drop_idx = list(range(0,df1.shape[1],2)) #Indexes to drop
df1.drop(drop_idx, axis=1)
DBMS 1 3 5 7
MySQL 24938.1 24937.1 24936.5 24934.5
perdelt commented 2 months ago

all infos from this issues should be included in docs by now