Closed AmosG closed 1 week ago
Sorry but i think i found that this would work: working-querywise-or-connectionswise
Working querywise or connectionswise This options sets if benchmarks are performed per query (one after the other is completed) or per connection (one after the other is completed).
This means processing -w query
is
loop over queries q
loop over connections c
making n benchmarks for q and c
compute statistics
save results
generate reports
and processing -w connection
is
loop over connections c loop over queries q making n benchmarks for q and c compute statistics save results generate reports
Hello @AmosG I assume dbmsbenchmarker is not well prepared for your use case. A typical use case would be running a sequence of complicated and parameterized (and slow) queries like in TPC-H and TPC-DS. The result is retrieved, stored and compared. Stress for the DBMS comes from high repetition, not parallel execution.
The querywise or connectionwise parameter itself is not parallelizing anything.
However I think dashboards are a valid and interesting use case.
As an example, what does a query look like?
What would you expect dbmsbenchmarker to report after the benchmark? The total throughput?
Dashboard queries https://www.geckoboard.com/dashboard-examples/executive/sql-dashboard/ are no different than typical OLAP style queries, aggregate measures grouped by certain dimensions, some trends, top x ordered lists of certain entities, all this run for comparisons between 2 or more time periods.
The only difference is that these queries are fired at once from a single client. IMO , if you have --numProcesses quite large one should get the same numbers and over a period of time all the dashboard queries would run in parallel across multiple benchmark clients. One wouldn't get stats at a dashboard level though.
not parallel execution.
When you mention not parallel execution, I presume you meant for a set of queries, not across processes
Yes, you are right. As an example, here is a config that runs 200 queries in 10 parallel batches of 20 queries each:
{
'name': 'Demo for parallel queries',
'connectionmanagement': {
'timeout': 100, # in seconds
'numProcesses': 10, # number of parallel client processes
'singleConnection': False, # allow connection not per stream
'runsPerConnection': 20, # reconnect for every 20 executions
},
'queries':
[
{
'title': "Compute random result",
'query': "SELECT 1+{NUMBER} AS result",
'numRun': 200, # total number of runs
'parameter': {
'NUMBER': {
'type': "integer",
'range': [1,100]
},
},
},
]
}
Note the following limitations:
Does this fit your use case?
Thank you, that will work. I have a clarification. Is runsPerConnection': 20, required. What is the behaviour if this is left out?
It is redundant at this point, but in general it is recommended to specify these parameters completely. If there is a contradiction in parameters about number of processes, size of batches etc, it cannot be guaranteed bahaviour is as expected.
Given that in the day to day usage Users enter a dashboard that kicks off between 10-50 queries for each user
It would be idle to be able to simulate the same behaviour using the benchmarker