TPC-Council / HammerDB

HammerDB Database Load Testing and Benchmarking Tool
http://www.hammerdb.com
GNU General Public License v3.0
545 stars 115 forks source link

Where to start to add support for another database engine #627

Closed marcobambini closed 8 months ago

marcobambini commented 9 months ago

Hello,

I want to add support for SQLite Cloud to HammerDB (we already have a C SDK). I read the documentation but have yet to find a reference about adding support for another database engine.

Can anyone please assist me?

sm-shaw commented 9 months ago

This is a great question. Initially, to answer the question on adding a new database, there is a series of blog posts starting here that show the steps.

SQLite is an awesome database and is already built into HammerDB, all the configuration and the jobs based data and results are stored in a series of SQLite databases, so we already have SQLite including a highly efficient API built into HammerDB. In fact, SQLite started out as a Tcl extension (which HammerDB is written in) so the compatibility couldn't be closer, meaning in theory it is doable.

However, having said that, we have this to consider: Appropriate Uses For SQLite

SQLite is not directly comparable to client/server SQL database engines such as MySQL, Oracle, PostgreSQL, or SQL Server since SQLite is trying to solve a different problem. Client/server SQL database engines strive to implement a shared repository of enterprise data. They emphasize scalability, concurrency, centralization, and control. SQLite strives to provide local data storage for individual applications and devices. SQLite emphasizes economy, efficiency, reliability, independence, and simplicity. SQLite does not compete with client/server databases. SQLite competes with fopen().

So, we are seeing that SQLite is not comparable to MySQL, Oracle, PostgreSQL, or SQL Server or in other words the databases that HammerDB has been designed to test. In particular, this is a key issue as HammerDB has been designed to test high database concurrency with its parallel architecture.

High Concurrency SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.

Finally, as an example of where this might be headed, HammerDB has hidden support for Redis. (Redis is a database that also started out in Tcl)

However Redis is single-threaded

Redis uses a mostly single threaded design. This means that a single process serves all the client requests, using a technique called multiplexing. This means that Redis can serve a single request in every given moment, so all the requests are served sequentially. This is very similar to how Node.js works as well. However, both products are not often perceived as being slow. This is caused in part by the small amount of time to complete a single request, but primarily because these products are designed to not block on system calls, such as reading data from or writing data to a socket.

This means we can't push it much beyond the throughput of a single virtual user (try it and compare it to one of the databases mentioned above with a larger virtual user count) and therefore it provided an unfair comparison of Redis against the client/server SQL database engines mentioned before, and for this reason it was deprecated from HammerDB.

So, yes, we think SQLite is an incredibly good database and (like Redis) it could be added. But it is doesn't look like the workloads that HammerDB generates are ones that are appropriate for testing SQLite.

marcobambini commented 9 months ago

@sm-shaw thanks a lot for the detailed explanation, I really appreciate your help.

We are not SQLite, we are SQLite Cloud, a true client-server architecture like PostgreSQL and MySQL. We already have a C SDK so we could use it as a starting point.

SQLite Cloud is massively multi-threading and we support the BEGIN CONCURRENT transactions to increase concurrency in SQLite.

Given that context, SQLite Cloud can be a very good fit for the workloads that HammerDB generates, that's the reason why I am so interested in trying to add support for it into the HammerDB benchmark suite.

sm-shaw commented 9 months ago

So we wouldn't add support for only a cloud derivative of a database (and there is already cloud backed SQLite).

When considering what databases to add we use DB-engines as a reference, and it is not a coincidence we support the top relational databases. So SQLite would be a candidate, however for the reasons above native SQLite is not appropriate at this point in time.

A cloud derivative is fine if we already support the database it is derived from or extends - such as Citus Data and PostgreSQL.

It is worth noting that some distributed databases don't work well in benchmark environments and give low throughput such as cockroachdb and yugabyte, which is nothing to do with the workload but the way the distributed database is implemented resulting in high query latencies, and they would argue that performance is not the main consideration for them e.g. https://devblogs.microsoft.com/cosmosdb/distributed-postgresql-benchmarks-using-hammerdb-by-gigaom/

Also, if stored procedures are not supported then this is going to be a big disadvantage for performance from the outset regardless of implementation https://www.hammerdb.com/blog/uncategorized/why-you-should-benchmark-your-database-using-stored-procedures/ (all the top performing databases have stored procedures)

So you are welcome to build a prototype to support SQLite and your cloud derivative as an alternative based on this and we can assess whether it shows the database in a suitable manner. We are fans of SQLite but from the information we have it still doesn't look like SQLite or a SQLite based database is going to perform well with the type of workloads HammerDB implements.

sm-shaw commented 8 months ago

For discussion/decision by TPC-OSS subcommittee

sm-shaw commented 8 months ago

Discussed at the TPC-OSS subcommittee meeting on 7th Nov. Agreed by the subcommittee that supporting SQLite/SQLite Cloud is not something that we would engage in directly for the reasons given above. However, if a suitable prototype is done using the guidance above that would demonstrate acceptable (comparable to current supported databases) then further work could be considered. Issue will be converted to a discussion topic to contain the information relevant to anyone looking to run benchmarks against SQLite or derivatives.