AsYetUntitled / Framework

Altis Life RPG mission framework for Arma 3 originally made by @TAWTonic.
Other
245 stars 308 forks source link

database connection gets lost on long timeout #691

Open ghost opened 4 years ago

ghost commented 4 years ago

Expected behaviour

Server can be run until manual restart or crash.

Actual behaviour

Server lose database connection when idle for too long without any database activity (connection is dropped by database server due to timeout).

Mission version: v6 86f766e

I've added a "fix" to my #688. Sorry for edit AL.ini three times in a row, but on the first one I missed that I had already implemented #653 and then it was just re-ordering to not spam the diff too much. Calling the precedures every hour refreshes the timeout (at least on my server timeout seem to bee something longer than 3600sec). I also added the missing precedure (as it wasn't even used on server start up let alone anywhere else).

I now that AltisLife servers are restarted regualr (from what I've seen every 4h - 8h). Although I guess that's cause the game clogs up over time and restart has to be done now and then on busy servers let it run idle for a longer time without any restart shouldn't result in it become no longer playable as it relies on the database connection. Interesting this didn't came up yet.

blackfisch commented 4 years ago

That should not be an issue for the framework, but for your SQL-server configuration regarding timeout. Database connectivity is handled from extDB3 - and I guess it's not supposed to sit idle for 8 hours or something like this.

As I said; there should be a setting in your SQL server

ghost commented 4 years ago

That should not be an issue for the framework, but for your SQL-server configuration regarding timeout.

I don't think so - as every well configured server should have rather short timeouts. I'm surprised the default MariaDB server install even have an idle timeout of more than 1h. I expected something like 5min.

Database connectivity is handled from extDB3 - and I guess it's not supposed to sit idle for 8 hours or something like this.

Well, yea, the connection itself sure is done by the extDB3 - in fact that's the base how ArmaLife works in the first place - but it doesn't have any heartbeat in its own. It just provides the basic TCP connection. And although on my server there's currently no activity I wonder how "big" servers do it when they have to gap 4h-8h of no activity at all. Assume a big known server has a long stretch of idle in the morning when no player is active. And just before the usual repoint at noon a player joins. Without any heartbeat / idle-handling such server would encounter the very same issue as I did: The database server dropped the connection due to idle-timeout. That's how a database server should behave on "dead" connections.

As I said; there should be a setting in your SQL server

Sure it is, the idle timeout, but it's not a solution to crank it up to 24h or more if the client (extDB3 in this case) does not send any data, not even a NO-OP, to keep the connection open. When looking at the connections they just sitting their idle in sleep state. This is not an issue with how the database server is set up or configured but the client not implement proper idle management. In fact, the database server behaves as it should on dropping this "dead" connection. Polling in long intervals keeps the connections alive - problem fixed.

BoGuu commented 4 years ago

This behavior should be handled by extDB3 as previously mentioned - and in fact it is. So if you're having issues, seeing your extDB logs would be helpful here

I'm not sure how you're seeing this behaviour (what made you think this?):

Server lose database connection when idle for too long without any database activity

As extDB3 is also pooling (with heartbeats, dropping idle connections)

ghost commented 4 years ago

I'm not sure how you're seeing this behaviour (what made you think this?):

By having phpMyAdmin showing me the active connections for the user - which shows all 6 connections from extDB3 idle in sleep state for more than 3600sec (I watched some YT and came back after about 4200sec (1h10m) and the connections were still in idle sleep). So, what ever extDB3 is doing within it's own pooling (btw: I looked through the code of the most recent linux version 1.031 - I couldn't find any "noop refresh" send to the database - if you know how it should be done please tell me how and where) it doesn't send any single byte to the database - hence the database kills the connections after reaching the timeout for just sitting the idle without any activity (I just looked it up: 28.800s = 8h - as said: that's the default install of by linux distro). I don't know much about developing database drivers - but I don't think it's its responsibility for implementing the heartbeat - as it's only providing the TCP connection itself. From as far as I know implementing heartbeat is on the application level - which in this case would be the arma3server using extDB3 as an underlying driver, but I could be wrong, which in fact doesn't really matter as neither of those two does any heartbeat, at least not that I was able to find it in the source anywhere. If you know where heartbeat is done in the source please tell me. As for me the question arise from where you know that extDB3 itself does it as you say.

BoGuu commented 4 years ago

So, what ever extDB3 is doing within it's own pooling (btw: I looked through the code of the most recent linux version 1.031 - I couldn't find any "noop refresh" send to the database - if you know how it should be done please tell me how and where) it doesn't send any single byte to the database - hence the database kills the connections after reaching the timeout for just sitting the idle without any activity (I just looked it up: 28.800s = 8h - as said: that's the default install of by linux distro).

It uses mysql_ping to check the connection, and resets if need be.

I don't know much about developing database drivers - but I don't think it's its responsibility for implementing the heartbeat - as it's only providing the TCP connection itself.

Ultimately if one is given a connection, one should expect it to work when required, or notion to trigger the reconnect. extDB3 implements the connector, and given we pass it queries, we'd expect it to handle timeout behaviour.

If you know where heartbeat is done in the source please tell me. As for me the question arise from where you know that extDB3 itself does it as you say.

https://bitbucket.org/torndeco/extdb3/src/f1a08ecf20c7de0d1b043b600da9eb3c3d0b0f9a/src/mariaDB/pool.cpp#lines-67:95

I'm not concerned with what phpMyAdmin says - just what is written in your extDB3 logs by the MariaDB connector - that's what is important here - posting them would be helpful.

ghost commented 4 years ago

It uses mysql_ping to check the connection, and resets if need be.

Well, problem is: Exactly THAT is not happening, as proven by the display of phpMyAdmin (which shows no more active connections after the server ran idle for over h) and the server just spitting out some error as it doesn't have a connection the database anymore - which, just as an example, resulted in no data loaded after connecting after the server ran idle for 8h. I don't report this cause I came up with it for no reason - the database just kills the connections and extDB3 is not able to re-establish them - which in the end results in the framework no longer working correctly if the server runs idle for longer than the timeout of the database server.

Ultimately if one is given a connection, one should expect it to work when required, or notion to trigger the reconnect. extDB3 implements the connector, and given we pass it queries, we'd expect it to handle timeout behaviour.

Uhm, ok, as said: I don't know much about it, but also can't remember to have seen that behaviour before.

https://bitbucket.org/torndeco/extdb3/src/f1a08ecf20c7de0d1b043b600da9eb3c3d0b0f9a/src/mariaDB/pool.cpp#lines-67:95

I'm not concerned with what phpMyAdmin says - just what is written in your extDB3 logs by the MariaDB connector - that's what is important here - posting them would be helpful.

The extDB3 logs are just empty. The just have the "locked" right at the start - and then nothing no matter if the server was busy or idle. But my arma3server also does not produce any RPT log files (yes, I already deleted everything and re-downloaded it) - so I don't know if just nothin get's logged or if for what ever reason the linux dedicated binary behaves differently. Sure I want to post logs - but I have none other than the console output. Btw: Also looking through the connector-c source for how mysql_ping() is implemented doesn't really show where data is exchanged with the server. As far as I can tell (don't know C(++) well) there's only a TCP ping to see if the TCP connection itself is still open - but no data are exchanged on the application layer.

blackfisch commented 4 years ago

Again; That seems more like an issue with extDB3 than the Framework itself - so I guess this issue here is irrelevant or did I miss something?

ghost commented 4 years ago

[...] or did I miss something?

Well, executing the procedures every hour at least results in regular client-server data exchange which refresh the timeout and hence prevent the connection getting killed from the database server. This would even allow to set the timout on the database server to somewhat more reasonable like 30m and have the game poll about every 25m to have about 5m to spare if something busy laggs the whole construct.

blackfisch commented 4 years ago

Executring the procedure once an hour would be unnecessary database traffic additionally you would add an extra script to the scheduler which is not ideal for server performance and should be avoided if possible. Deleting the entries at restart is just fine.

Sorry mate, I don't think this would be a better solution than setting a higher timeout or server restart at least every (TIMEOUT) hours

ghost commented 4 years ago

Uhm ... I think I understand.

BoGuu commented 4 years ago

The extDB3 logs are just empty. The just have the "locked" right at the start - and then nothing no matter if the server was busy or idle. But my arma3server also does not produce any RPT log files (yes, I already deleted everything and re-downloaded it) - so I don't know if just nothin get's logged or if for what ever reason the linux dedicated binary behaves differently. Sure I want to post logs - but I have none other than the console output.

The console output for the Linux binary is the 'RPT' log, just pipe this to a file. It's odd there's nothing in the extDB log, if the connection was dead, queries would error. If you can get a full log set (server, client RPTs, and extDB) and post them, would be good.

This behavior should be fine, and frankly I've never seen it cause an issue before - lots of servers are empty through certain early hours, some with long restart periods.

We should diagnose your issue, and then seek an appropriate solution, but ultimately that'll be in extDB if it's universal and reproducible.

Btw: Also looking through the connector-c source for how mysql_ping() is implemented doesn't really show where data is exchanged with the server. As far as I can tell (don't know C(++) well) there's only a TCP ping to see if the TCP connection itself is still open - but no data are exchanged on the application layer.

The ping checks the connection status, and extDB is handling the re-connection in the error condition. It's also dropping idle connections, new ones will be pooled - so we should expect something.

See: https://dev.mysql.com/doc/refman/5.7/en/mysql-ping.html

I would keep this issue open until we come up with a solution, but that shouldn't be on the code side here.