cmu-db / benchbase

Multi-DBMS SQL Benchmarking Framework via JDBC
https://db.cs.cmu.edu/projects/benchbase/
Other
469 stars 183 forks source link

add capability to disable warehouse_affinity #348

Open ardentperf opened 1 year ago

ardentperf commented 1 year ago

From what I can tell, right now every terminal requires its own DB connection. It seems that HikariCP support was added for a little while by @timveil but then removed later; no commit message but from other commits my guess is there were problems around connection leaking?

Anyway - from my read of the code and experience with benchbase so far, I suspect that running with fewer terminals than warehouses means benchbase simply won't have any terminals running against lots of warehouses. For example: at scale 10,000 warehouses (1TB) and with 1,000 terminals/connections, only a tenth of the data set would actually be operated on.

This makes benchbase far less useful for testing with TPCC-like workloads on large datasets, as tens/hundreds of thousands of connections isn't going to work.

It would be useful for testing to have a capability to run in a mode where the full data set is operated on. HammerDB has the option to "use all warehouses" and @apavlo's old H-Store implementation of tpcc had the warehouse_affinity option which could be disabled (if I'm reading it correctly).

https://github.com/apavlo/h-store/blob/master/src/benchmarks/org/voltdb/benchmark/tpcc/TPCCConfig.java#L18-L56

Adding this issue as a placeholder and for discussion; would love to work on this if I can find the bandwidth.

bpkroth commented 7 months ago

Per discussions elsewhere, haven't had a ton of cycles to work on this myself, but would happily help review a PR for this.

bpkroth commented 7 months ago

@timveil do you want to comment on any experience with HikariCP?

bpkroth commented 7 months ago

One other option c3p0 and c3p0-loom for use with virtual threads were also brought up in #398

timveil commented 7 months ago

I love Hikari, have used it dozens of projects and never found it to be a performance bottleneck. I never went back to research the cause of the performance slow down that was noticed but i suspect the root cause might lie elsewhere. I don't think it was connection leakage. When i made the initial change my hunch was that we could do a better job of establishing and managing connections (pooling) even if we wanted to keep those pools small. Also happy to revisit this.

apavlo commented 7 months ago

I think @lmwnshn saw the perf regressions in his experiments.

lmwnshn commented 6 months ago

Relevant PR: #29

No cycles right now, but also open to revisiting.