alliedmodders / sourcemod

SourceMod - Source Engine Scripting and Administration
http://www.sourcemod.net/
973 stars 422 forks source link

Clientprefs spamming queries to database server, then writes to the wrong database #778

Open geominorai opened 6 years ago

geominorai commented 6 years ago

When failing to connect to a remote database for clientprefs (i.e. not using SQLite), the extension floods the database with table creation commands multiple times each per seconds until it crashes:

L 03/06/2018 - 10:03:32: [CLIENTPREFS] Attempting to reconnect to database...
L 03/06/2018 - 10:03:32: [CLIENTPREFS] Failed to CreateTable sm_cookies: WSREP has not yet prepared node for application use
L 03/06/2018 - 10:03:32: [CLIENTPREFS] Attempting to reconnect to database...
L 03/06/2018 - 10:03:32: [CLIENTPREFS] Failed to CreateTable sm_cookies: WSREP has not yet prepared node for application use
L 03/06/2018 - 10:03:33: [CLIENTPREFS] Attempting to reconnect to database...
L 03/06/2018 - 10:03:33: [CLIENTPREFS] Failed to CreateTable sm_cookies: WSREP has not yet prepared node for application use
L 03/06/2018 - 10:03:33: [CLIENTPREFS] Attempting to reconnect to database...
L 03/06/2018 - 10:03:33: [CLIENTPREFS] Failed to CreateTable sm_cookies: WSREP has not yet prepared node for application use

In my use case, the database is hosted on a cluster node that occasionally loses sync with the rest of the cluster and needs time to resync. During this time, it rejects queries but clientprefs will not stop flooding it.

This happens on many of my game servers, and the tendency is for the spam to continue on for many minutes until the game server crashes right after a new player joins. It hangs and refuses to accept even server console commands until the watchdog kicks in and reboots it.

After the crash, checking the database via phpmyadmin, I also found that clientprefs somehow manages to create sm_cookies and sm_cookie_cache tables in a random but completely different database than what it was assigned to. i.e. In some other database defined in database.cfg.

Repeat this crash many times, and eventually every database in database.cfg will have sm_cookies and sm_cookie_cache tables created.

geominorai commented 6 years ago

More related crashes that point to a corrupted variable in MyDriver.cpp: https://crash.limetech.org/w2zaep4ar4wc

L 05/07/2018 - 06:26:10: [CLIENTPREFS] [2005]: Unknown MySQL server host 'DD' (1)
L 05/07/2018 - 06:26:11: [CLIENTPREFS] Attempting to reconnect to database...
L 05/07/2018 - 06:26:11: [CLIENTPREFS] [2005]: Unknown MySQL server host 'DD' (1)
L 05/07/2018 - 06:26:11: [CLIENTPREFS] Attempting to reconnect to database...
L 05/07/2018 - 06:26:11: [CLIENTPREFS] [2005]: Unknown MySQL server host 'DD' (1)

https://crash.limetech.org/zd374pcyje4i

L 05/07/2018 - 06:15:59: [CLIENTPREFS] Attempting to reconnect to database...
L 05/07/2018 - 06:15:59: [CLIENTPREFS] [2005]: Unknown MySQL server host '' (1)
L 05/07/2018 - 06:16:00: [CLIENTPREFS] Attempting to reconnect to database...
L 05/07/2018 - 06:16:00: [CLIENTPREFS] [2005]: Unknown MySQL server host '' (1)
L 05/07/2018 - 06:16:00: [CLIENTPREFS] Attempting to reconnect to database...
L 05/07/2018 - 06:16:00: [CLIENTPREFS] [2005]: Unknown MySQL server host '' (1)

Eventually leading to the game server getting automatically blacklisted by the database server: https://crash.limetech.org/go245jwdrqjz

L 05/07/2018 - 10:50:24: [CLIENTPREFS] Attempting to reconnect to database...
L 05/07/2018 - 10:50:24: [CLIENTPREFS] [1129]: Host '64.94.100.240' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
L 05/07/2018 - 10:50:24: [CLIENTPREFS] Attempting to reconnect to database...
L 05/07/2018 - 10:50:24: [CLIENTPREFS] [1129]: Host '64.94.100.240' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
L 05/07/2018 - 10:50:24: [CLIENTPREFS] Attempting to reconnect to database...
L 05/07/2018 - 10:50:24: [CLIENTPREFS] [1129]: Host '64.94.100.240' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
KyleSanderson commented 6 years ago

As an aside are you able to atleast fix your mysql server? The rest is unfortunately defined until the race conditions are resolved inside the ext.

geominorai commented 6 years ago

I'll look into installing HAProxy to reduce the downtime of one database node but the server may choke for plugins that use non-threaded queries while the proxy relays it to another node.

I'm waiting for a Sourcemod release with #773 so I can make a plugin do the DB node health checks client-side and hot swap the config with one using backup nodes in response.

KyleSanderson commented 6 years ago

What? The config file you showed is clearly crap as the server is rejecting connections from the server. Please fix this and you'll see far less issues when your network drops. It's not the in-product fix, but atleast half these issues are resolvable on your end.

geominorai commented 6 years ago

The config files are perfectly fine and the database functions as normal and the database queries go through and show up in the database until this bug happens, which is when the database node has to pause to resync (or somehow the game server loses connection to the database briefly for whatever reason), then clientprefs starts to eat itself and other database handles alive.

KyleSanderson commented 6 years ago

Duplicate of https://github.com/alliedmodders/sourcemod/issues/726

My personal recommendation is to fix your environment and stop denying it's an issue. TCP dropouts are common and happen all the time, your environment configuration should be setup for this.

geominorai commented 6 years ago

This cannot be completely resolved unless #726 is fixed or clientprefs is required to only work with a local sqlite database configuration. Even a minor hiccup in the network due to TCP dropouts will cause clientprefs to start corrupting its own DatabaseInfo.

geominorai commented 6 years ago

Possibly related dbi.mysql.ext crash involving atomic operations: https://crash.limetech.org/6xdi7o7b7sfn

geominorai commented 6 years ago

Crash with console log showing table creation attempt in different databases: https://crash.limetech.org/onlmsgo4gxfd/console

geominorai commented 6 years ago

Hastebin of console log: https://hastebin.com/jaxocevizo

geominorai commented 5 years ago

Returning to this issue about a year later with a more stable MySQL database cluster network connection that does not flood with the WSREP issue in the original post, I still encounter this bug where the database and clientprefs extension believe adamantly that they have Lost connection to MySQL server during query but should not.

I managed to catch that when this bug happens, that particular game server cannot connect to the database even if the database network and status is fine and other game servers on the same machine with the same database config can connect to the database fine during this. And like before, the server eventually crashes shortly afterwards.

Here is the recent crash dump: https://crash.limetech.org/sg4agdetqnxs

This time, the clientprefs errors spewed into the console successfully logged the extension caught in the act going all over the place trying to create tables in databases it has absolutely no business in. In the database config, it has a designated "clientprefs" database and login to use but as seen in the console, it tries just about everywhere else:

[CLIENTPREFS] Failed SQL Query, Error: "Table 'jumpacademy_sourcebans.sm_cookies' doesn't exist" (Query id 0 - serial 0)

[CLIENTPREFS] Failed SQL Query, Error: "Table 'jumpacademy_calladmin.sm_cookies' doesn't exist" (Query id 3 - serial 0)

Then afterwards that mess it eventually deadlocks some table and crashes itself: [CLIENTPREFS] Failed to CreateTable sm_cookies: Deadlock found when trying to get lock; try restarting transaction

This is now very clear that it's not a bug happening on the database side.

geominorai commented 5 years ago

The bug where all plugins suddenly lose database connections does not happen if the clientprefs extension is not also hooked up to the mysql database (i.e., default config with sqlite). This suggests the extension is causing some kind of corruption to the mysql extension.

geominorai commented 5 years ago

The database loss bug also does not disappear after the crash auto restart either. It repeats itself (again, while other servers have no trouble with the db) until I kill the srcds process and manually restart it.

geominorai commented 3 years ago

A recent crash today from https://crash.limetech.org/ghf52wdnzckf convinces me that clientprefs crashes are due to attempting to read a completely different part of some string table in memory:

L 03/26/2021 - 08:54:12: [CLIENTPREFS] Attempting to reconnect to database...
L 03/26/2021 - 08:54:12: [CLIENTPREFS] [2005]: Unknown MySQL server host 'Found Rec Map' (0)

Found Rec Map is actually a phrase from a translation file from one of the server plugins.

sapphonie commented 3 years ago

Is this a priority to be fixed? Seems like a pretty large issue that Clientprefs just reads garbage out of memory if you don't have a rock stable MySQL server

dvander commented 2 years ago

If you can reproduce this on Linux, you could try running your server within valgrind to catch the error. If that's not an option we could add a way to generate ASAN builds.

sapphonie commented 2 years ago

If you can reproduce this on Linux, you could try running your server within valgrind to catch the error. If that's not an option we could add a way to generate ASAN builds.

I can reproduce a crash/watchdog timeout, is that enough?