FredyH / MySQLOO

MySQLOO
GNU Lesser General Public License v2.1
140 stars 55 forks source link

Server hang in Database::~Database() #118

Closed 2048khz-gachi-rmx closed 1 year ago

2048khz-gachi-rmx commented 1 year ago

For some reason, queries can get stuck in some kind of limbo, where their status says "running" but they don't finish nor time out for a while Because MySQLOO joins the runner thread on DB destructor, the server can hang for a really long time It'd be nice if there was an option to set a query's (or all queries'?) timeout, since a pure Lua solution could have edge cases (ie if a query launches before a map change, Lua won't get a chance to abort it since the hang will already occur)

(I don't know if this is actually it, but it could be the MYSQL_OPT_READ_TIMEOUT and MYSQL_OPT_WRITE_TIMEOUT options?)

gdb of a hanging server's dump: dump

FredyH commented 1 year ago

Are these queries that actually take that long, or rather queries that should finish quickly but don't due to the map changing/db destructing?

2048khz-gachi-rmx commented 1 year ago

I think this may be a network issue; it's a simple SELECT which hangs around (max. time i have logged is 809s since starting it) Regardless, a timeout would be nice

--- potentially hanging queries?
115.20s. : "SELECT crap FROM crap_table WHERE sid64 = 7656691337 AND thing = 'xd'"
66.39s. : "SELECT crap FROM crap_table WHERE sid64 = 7656691337 AND thing = 'xd'"

--- will log this in data/hanging_dumps/sql_flush_1674748627.dat
SoundEmitter:  removing map sound overrides [562 to remove, 19 to restore]
<hang occurs here>
local dbm = FindMetaTable("MySQLOO Database")
dbm.oQry = dbm.oQry or dbm.query
DEBUG_allQries = DEBUG_allQries or {}

-- timer to cleanup inactive queries goes here

function dbm:query(strQry, ...)
    local q = self:oQry(strQry, ...)
    q.qry = strQry
    q.when = SysTime()
    DEBUG_allQries[q] = true
    return q
end

function format_hanging_qries()
    local t = {}

    for v in pairs(DEBUG_allQries) do
        if v:isRunning() and SysTime() - v.when > 2 then
            t[#t + 1] = ("%.2fs. : \"%s\""):format(SysTime() - v.when, v.qry)
        end
    end

    return table.concat(t, "\n")
end

hook.Add("ShutDown", "SqlThing", function()
    -- flush the return of format_hanging_qries()
end)

After looking at the logged hanging queries, it looks like it's always the same query that runs on PlayerInitialSpawn, so i wonder if it has something to do with the DB going inactive for a while, then timing out the first query. Maybe some kind of busted auto-reconnect logic?

FredyH commented 1 year ago

I think someone else had a similar issue once and it was to do with a weird firewall setup I believe. Is the MySQL server hosted remotely or on the same server as srcds? Also, do these select queries only hang when reloading the map, or can they also take a very long time just while running regularly? If so, when they do finish, do they actually return a valid result, or an error?

2048khz-gachi-rmx commented 1 year ago

The DB is hosted on a remote server Normal queries ran anytime can get stuck; from the only hanging query i caught finish live, it actually finished on its' own succesfully something like 20 minutes later (before the map changed or anything)

I'll check everything possible with any firewalls and check in with the machine provider i guess???????

update: Database:ping() also hangs the server

2048khz-gachi-rmx commented 1 year ago

Like I mentioned in the beginning, implementing timeout options mitigated the issue at least partially (the queries still hang randomly, however, it's not as awful; srcds doesn't hang for 20 minutes now before deciding something's not right).

Provider said they have no extra traffic filters and there shouldn't be any such filters on the servers themselves, so not sure wtf is up with that.

FWIW im guessing the connections are timing out due to inactivity and not re-running the queries reconnecting properly, but i have a solution now so whatever i guess image