Closed Warp-ass closed 9 years ago
update: while almost every crash was originally on the recast, now it seems to be coming very often from places that cannot be seen directly in the call stack. looking at the map's log file, i see this for the most recent crash:
DB error - Lost connection to MySQL server during query [Fatal Error][0m [1;31mzoneutils::GetZoneIPP: Cannot find zone 110 [Debug][0m Message: Sent message 10 to message server [Debug][0m Message: Received message 10 from message server
very often the crashes seem to coincide with someone changing zones in the log, as well as this "sent/received message # to/from server".
what does show up in the dump, is the next step would be for sql.cpp within int32 Sql_QueryV, with the comment above it that says "executes a query". I am doing my best to figure out what is going on here but this is frustrating.
just got a log full of "cannot load charid ##### DB Error - my sql server has gone away" but only for a large chunk of players. many others in other zones were fine, but had to force a server restart to free the mass of players.
crashes have dropped in frequency from happening every 2-15 minute to now happening up to an hour apart. not sure what changed, other than how late it is. there are more players on now than before, my best guess is lobby server traffic or something, i don't know. i give up for today.
there is really nothing i can do with this unless you provide a dump
it sounds like issues with your DB at any rate
https://www.dropbox.com/s/00z5cgmggkc2ur1/12-16-Recast.7z?dl=0
looks like a crappy char was loaded into the zone's m_charList CZone::IncreaseZoneCounter checks the targid before adding the char to the zone's char list so im assuming something went wrong after the char was added (maybe when the player was removed from zone in CZone::DecreaseZoneCounter)
kind of a shot in the dark but after looking at it with demo for a while, it looks like it might have something to do with outpost warps. many of the autos show northern sandoria/bastok mines as the zone or prevzone
have the log too? that'd help
this log should encompass it, sorry for the size, just grabbed it quickly while working
i think the log starts yesterday afternoon around 1
https://www.dropbox.com/s/w97j7h2o8l59po3/scavenger-map-server.log?dl=0
i would tell you when the crash happened but it was demo that submitted that dump so i don't actually know, presumably some time around when he posted it.
yeah... every time it crashes, you should rename the log and start a new one so it's easy to find where it crashed
will do from now on.
put in your latest commits, but still randomly getting "mysql server has gone away" whenever the server manages to stay up for over an hour
reconsideringmy theories, many of the crashes are coming from cities, most recently in the last few days 100% of our recast crashes have come from lower jeuno, and checking the autos, all but one player's names will be available and the missing info will be a mess of scrambled data for a char. checking the log, the names of poeple that I can find in lower jeunoe before the crash going by increaseZoneCounter seem to always be zoning in very soon before the crash happens. I am leaning toward saying that his is happening during the process of them zoning in, or soon after. I am preparing another dump with a log that is more manageable.
lower jeuno is the default player position i think
could be, but some of the crashes before the past few days have been in northern sandoria, bastok markets and beaucedine glacier
it's a char reference that got deleted but not removed from the charList, i'm still trying to track down how/why
thanks for that. let me know if you need or want more dumps, anything i can do to help the process.
also it sounds widespread to the point where this probably can't be done, but any ideas on how to at least prevent crashes by disabling something or ghetto fixing?
no, it'd be easy to fix if i knew what the problem was
updates:
the server has a pattern of running decently stable for a while, then after a reboot can crash multiple times in a row within seconds of each one (has done it as much as 5 crashes within 60 seconds and 10 crashes within 5 minutes), which I would guess is somehow related to the situation of the first thing in this list and maybe even changing zones. Based on my best guess, I have tried telling the players to wait until Downloading Data is gone before doing any kind of input. I will try to collect dumps/logs on these as I go.
just had a crash with autos that were almost exactly the same as the recast coming from an elevator in metalworks. first time i've seen this. have the dump for this if it's relevant, unless it's still the same issue and this doesn't help at all.
to put into perspective how bad the crashes have been today, one player has been attempting to kill behemoth for the last 3 hours and the server hasn't stayed up long enough for them to kill it once, even with my help spawning it. maybe it's getting worse, maybe today is shitty luck. sorry to sound like i'm complaining, just trying to feed you as much info as possible.
there's a reason i made it possible to have zones on different processes, use that in the meantime
i don't have time currently to look into anything (and i can't really drop anything to make time, especially if era is still a donation server)
I just wanted to provide feedback on the new version which I've heard no one else does lately. but I understand. when you find the time I will provide you with all the debug info you need. for now I will have to revert to our early november version until I figure out the zone settings you suggested.
fixed now
crashing extremely often on the current version, while on debug VS first points to a vector, and then points to recast container as the next step after it returns from the current function. my guess is that it is crashing on the GetRecastList when checking
RecastList_t* PRecastList = GetRecastList((RECASTTYPE)type);
because it says the next step will be the next line:
for (uint16 i = 0; i < PRecastList->size(); ++i)
When I follow it back in the callstack it points to zone_entities and zoneserver. I mention this because prior to getting the current version running, we were having a number of issues with zone_settings.sql . I am still trying to debug these crashes and create dumps on them, but many of the crashes break on different places, but the dumps all show that the call stack includes zoneserver. I'm still working hard to pinpoint this but if anyone has any input on this, it would be greatly appreciated.