A248 / LibertyBans

The be-all, end-all of discipline.
https://ci.hahota.net:8443/job/LibertyBans/
GNU Affero General Public License v3.0
169 stars 41 forks source link

HSQLDB query performance does not scale #190

Closed TheMattVid closed 1 year ago

TheMattVid commented 1 year ago

LibertyBans Version

LibertyBans 1.0.4

I have confirmed that ...

Platform

Bungeecord

Description

address-strictness: 'STRICT' use-composite-victims-by-default: false

Versions: Proxy: git:Waterfall-Bootstrap:1.19-R0.1-SNAPSHOT:13085b9:510 Lobby: git-Paper-111 (MC: 1.19.1) Survival: git-Purpur-1858 (MC: 1.19.2) LibertyBans: 1.0.4 Geyser plugin: 2.1.0-SNAPSHOT

All players connect to the lobby by default each login. The player and operator were on survival. The operator banned the player from survival. The banned player attempted to log in multiple times unsuccessfully, but then is able to log on.

Here are selected lines from the logs: https://mclo.gs/JNwzqSb

I notice that there are errors about things taking long to process, as well as an error during the instance that the banned player was able to join.

At the time the banned player was able to join, I was surprised and attempted to ban him. The ban did not work, stating that the player was already banned. I looked in the player's ban history, and his ban was shown there. I figured that I could rectify the issue by unbanning and rebanning the player. I used the unban command, and peculiarly, the plugin simultaneously output one message that the player was unbanned and another message that the player was banned. The player was then kicked off of the server with the ban message.

A248 commented 1 year ago

Your LibertyBans installation is under strain due to database connectivity. As you've noticed, operations are slow to process and timeouts are occuring. The connection pool is exhausted, which I have inferred from the behavior you describe regarding your attempts to unban and reban players.

We use a fixed connection pool for efficiency purposes, which means it is necessary to manually size the connection pool. For small servers the default value of 6 connections is sufficient, but for a proxy with many connected players, I'd recommend increasing it to 10. You might be surprised to see such small numbers; see this page by HikariCP for explanation.

On your server, LibertyBans cannot contact the database quickly enough to keep up with the load you are experiencing. Besides increasing the connection pool size, make sure your remote database is not inherently slow. For example, a MariaDB server located in a different geographic region with poor network infrastructure incurs significant network overhead, increasing response time. Additionally, consider increasing the mute cache expiration time to reduce the number of times LibertyBans contacts the database to check mutes. It is possible that these measures obviate the need for a larger connection pool.

The story you tell about attempting to ban and reban a player seems peculiar, but it is explainable by a busy connection pool. When a command such as /ban or /unban triggers a database request, threads must wait to obtain a connection from the pool. Since all operations are happening asynchronously and there is no imposed order on database requests, it's possible for a recent operation to obtain a database connection before an earlier operation. This phenomenon (relating to the fairness of a connection pool) can make the effects of banning and unbanning happen out-of-order.

TheMattVid commented 1 year ago

I am using the default HSQLDB database for this. 10-20 players on the network concurrently. What will help most to speed up the database connection? I do not have this issue for any other plugin on the proxy or backend, including CoreProtect which I am using local storage for (no SQL setup). If increasing the connection pool size to 10 will help, I can do that. I can also switch to MySQL if necessary.

A248 commented 1 year ago

HSQLDB is the speediest database, faster than any remote database such as MySQL. As it relies on the filesystem, slow disk access might be an issue, but that would also manifest elsewhere such as in chunk loading or other plugins like CoreProtect. Given that you have 10-20 players, increasing the connection pool to 10 could actually hurt, but it may be worth a try; at this player count I don't expect you'd have a high load.

Ruling out the above, we should investigate query performance. There might be a full table scan happening somewhere. How long have you been using LibertyBans for? Specifically, do you have lots of data? Debugging slow queries can be an involved process. I haven't checked query plans for every single statement in the codebase, only the most common of them. Please tell me:

If you are comfortable with it, you may send me a copy of the database and I'll spend time debugging. Query execution plans are most accurate on large data sets. If I can reproduce the issue myself in a controlled environment, it is dead in the water.

TheMattVid commented 1 year ago

I have been using LibertyBans since October 10th. I am not sure if the "plugin listener" messages have been occurring the whole time or not. I am not sure if this is important, but I migrated from AdvancedBan when installing.

I believe there are just over 1000 lines, as the last ban is numbered 1017.

The connection-limiter is currently set to false.

The alts-auto-show feature is also set to false.

I can send you the database. What is the best way to send it to you while keeping it private from everyone else?

A248 commented 1 year ago

I can send you the database. What is the best way to send it to you while keeping it private from everyone else?

Email or Discord. You can find the public email address of any committer on Github by looking at the commit details.

A248 commented 1 year ago

It is surely a performance bug, which I have tracked down to poor query performance. I'll have to optimize the queries for selecting a user's applicable ban or mute (in fact it is the same query). Likely, the culprit is the full table scans I find in the execution plan.

Since you are using 1.0.4, would you like a backport of the fix to 1.0.x? Version 1.1.0 imposes new requirements such as Java 17, so a backport would be fitting.

As a temporary solution you may consider reducing the address-strictness to NORMAL. The applicability query for NORMAL strictness is more efficient than that for STRICT.

TheMattVid commented 1 year ago

I have Java 17, so it makes no difference to me. I can just update to the latest version.

A248 commented 1 year ago

We have good news and bad news.

The bad news is that HSQLDB does not scale. The statement I made "HSQLDB is the speediest database" turned out to be false. No matter how I write LibertyBans' queries -- whether using joins or sub-queries -- HSQLDB fails to optimize them despite available indexes. In contrast, MariaDB is a smarter and more advanced database. Its intelligent optimizer streamlines our queries by turning series of joins into index accesses. If you switch to MariaDB, you will notice a definite speed-up.

The good news is that LibertyBans' current queries are the most optimized I can make them; introducing sub-queries does not improve the MariaDB execution plan. This isn't the first time I've spent hours analyzing query performance: #136 gave us the current queries. So, it is not surprising that I've failed to make them any faster.

A fun fact is that using sub-queries and joins is very similar according to the MariaDB query optimizer. The index accesses turn out to be mostly the same, except there are a few more redundancies for the sub-queries. Even with common table expressions, sub-queries' performance can approach that of joins but never surpass the latter.

Regarding our database offerings there are a few conclusions. MariaDB scales much better than HSQLDB. MySQL will certainly be similar. As for PostgreSQL and CockroachDB, I'll test them out and observe the results -- As remote databases with a similar reputation for sophistication as MariaDB, I expect they will have no trouble optimizing our queries.

That leaves us with HSQLDB to contend with. Unfortunately, I do not have the capacity or time to rewrite HSQLDB with a better optimizer. Perhaps I'll report these results to Fred (the maintainer). Your data set, @TheMattVid , has been instrumental in all this performance testing, so thanks. Ultimately, however, I don't believe HSQLDB was designed for the complex queries we rely on in LibertyBans. A local databases is like a fast jackrabbit, zippy for simple queries but lacking the maniacal scalability and performance capacity of a beastly remote database.

This issue still deserves documentation so that users understand the performance considerations of depending on HSQLDB as the server scales.