Closed okx-code closed 1 month ago
Build Successful! You can find a link to the downloadable artifact below.
Name | Link |
---|---|
Commit | bb949ac6cdfbbe072c990d3970a8222ce570d8fa |
Logs | https://github.com/CivMC/Civ/actions/runs/9087028824 |
Download | https://github.com/CivMC/Civ/suites/23787093084/artifacts/ |
Question, to what extent, if at all, do ancient snitch logs (logs older than 30 days) play into this?
I don't think snitch log creation time would have any impact on the query planner. Perhaps insertion order could, but I doubt it.
@RedDevel2 Code looks good and logic is sound, moved to ready for testing although testing may be somewhat difficult.
I've been able to manually test this by querying the bugged snitches directly with the old and new statements. Not on the server itself but it should be good enough to deploy straight to production @AngrySoundTech
The only thing, you may want to add a comment about specifically using STRAIGHT_JOIN to force a specific order of joining.
The problem with snitches crashing the server is that the MariaDB query planner decides that it is going to be smart, and actually do a query from the (very small) ja_snitch_actions table, and then join the (humongous) ja_snitch_entries table.
To do this, it has the genius idea of filtering the ja_snitch_entries table by the type_id column. This column represents the type of snitch entry, be it entering the snitch, breaking a block, placing a block, etc. Naturally, MariaDB thinks that it can reduce the 10M+ row database to just 1000 rows with this filter. But it turns out that there are LOTS of entries for each type_id, so it actually get about 600K columns, and has to linearly search through them.
Only a handful of snitches have been found to crash the server this way. I don't know how the MariaDB query planner works, nor do I want to, but I can speculate that only very heavily trafficked snitches would crash the server in this way, or perhaps those that show an unusual distribution of entry types.
Here is the MariaDB ANALYZE output for when the query planner thinks it's really smart:
As you can see, the query planner thinks it will get 1241 rows, but it actually gets 600K. Big miscalculation.
In contrast, here is a normal snitch:
Interesting, the query planner drastically over-estimates the amount of rows it will scan. It thinks it will need to comb through 50K logs, but it actually ends up looking through 11.
Perhaps an alternative solution could be trying to update the data analysis the query planner gets - it may be severely inaccurate. However, this fix should be sufficient as STRAIGHT_JOIN forces the join to be done in a specific order (big table joins small table), instead of the wrong order (small table joins big table), which is ALWAYS wrong.
Database query planner bugs are just the best aren't they?