Open jeffnavy14 opened 2 years ago
we've seen this happen at least once, as well... server just enters some weird disassociated state...
are you running your servers on windows or linux?
are you running your servers on windows or linux?
Windows
Windows
How are you running your servers, batch file/command prompt, through visual studio?
batch file, but I debug through visual studio, sorry for the late response I remoted into my server and noticed I did not send this
have you seen it while debugging through visual studio, or only the batch file?
debugging through Visual studio the first time, but it caught 0 errors. the best way to describe the issue is that it just hangs. It is sporadic. Each time it stops, the batch file it just displays normal debugging messages. I put VS debug on last night and 2 hours later out of the blue the server "stalled/crashed" Debug was still running like it had 0 issues while the map server was non functional.
It's a bit confusing, you say "hang", "stops", "stalled", "crashed" but it continues to output debugging messages? that doesn't really sound like it's hanging. What messages are still being emitted? That can tell us what's still running or not.
While I personally run/build on linux I believe the visual studio debugger can show you the active threads, so you could take a capture of that at start then later to see if anything changed
It's a bit confusing, you say "hang", "stops", "stalled", "crashed" but it continues to output debugging messages? that doesn't really sound like it's hanging. What messages are still being emitted? That can tell us what's still running or not.
While I personally run/build on linux I believe the visual studio debugger can show you the active threads, so you could take a capture of that at start then later to see if anything changed
yes, this is precisely correct. the process continues to run but does not seem to 'respond' to anything... it will not accept new connections, all active players will R0 and disconnect. it requires ending the process manually and restarting it to resume operations normally. I'm not sure how to use VS in the capacity you're stating. Typically I attach it to a process and when that process crashes it generates a dump. That doesn't appear to happen here because the process doesn't actually crash.
On the threads window for visual studio: https://docs.microsoft.com/en-us/visualstudio/debugger/walkthrough-debugging-a-multithreaded-application?view=vs-2022#use-the-threads-window
On the threads window for visual studio: https://docs.microsoft.com/en-us/visualstudio/debugger/walkthrough-debugging-a-multithreaded-application?view=vs-2022#use-the-threads-window
Thanks, trying to follow this but the threads window is empty.
Hmm. Is it built in debug or release mode? I presume release mode would have less visibility into things like that.
Hmm. Is it built in debug or release mode? I presume release mode would have less visibility into things like that.
release
same release as well
Release mode by definition cuts out all the useful debug information for the debugger -- so when you have a chance run it in debug mode through visual studio. That should maximize any chances of seeing useful information of some kind so we can figure this out.
Release mode by definition cuts out all the useful debug information for the debugger -- so when you have a chance run it in debug mode through visual studio. That should maximize any chances of seeing useful information of some kind so we can figure this out.
As I am not a fan of running in debug mode as it likes to just crash on every little thing some times, to my dismay again it did it again, last activity was at 1:15am debugger still chugging along until I manually reset the server. at 6:50 am. I am at a loss at this point as there are 0 error messages, debugger found nothing. I have disabled all of my modules so now I can rule out my custom stuff. Also want to add I know its not any of my ports closing up as it does not display the decrease zone counter for the 15 people logged in at that time.
File bug reports for the debug mode issues so we can fix them, it will make future debugging better. I know that's a lot to ask especially when you have real players that shouldn't have to deal with downtime, but in the long run it will help everyone. We need your help on this front.
I made a windows build environment to figure out what's going on with the thread window, and it looks like it's only valid when you hit BREAK ALL:
So what I would like to ask you to do is to start the server, hit the break all button (the pause button here):
take a cap of what you have there, click the green arrow that is now visible (Continue), then wait for it to stop working again and re-cap. That will at least tell us if a thread died. If you want to check for some kind of hanging, you have to double click the threads in the thread window. Each thread will either show you LSB code of where the execution happened to stop (which may not be relevant!) or libraries we don't have access to, like windows runtimes. If you unpause, pause, and check repeatedly, you might be able to tell if there is a thread deadlocked at a specific spot as well. You will probably see a lot of sleeps if you do this, though.
Bonus points if you turn the log level up to noise and provide log files, there's probably something going on here but there's nothing for us to go off of. More information > less information
I am going to try one last ditch effort on finding this problem before we start messing with players (live server) I am going to ask my GM's and a few other players to jump on the test server and do their day-to-day, the crashes/hangs they do suck BUT they are random at times. my test server has alot more grunt than my live server which can handle the extra ram abuse when debug is rolling out. Since test/live are the same exact branch/build there should be 0 differences between the two.
@WinterSolstice8 thanks for spelling out what exactly you need. That's extremely helpful.
@WinterSolstice8 as it would take hours to attempt to find this issue, I forgot about !crash. I have tried this on 4 different builds as it does crash the server it DOES NOT close the application. On one build which I have a small log for, it displayed that it crashed but people were still able to zone, fight and do everything else normally (log attached). This was not run in debug, I am going to post that when I do a rebuild again.
Before
After (Crash 1):
Before 2:
After (Crash 2):
Before 3:
After (Crash 3):
Hmm. These look like unrelated to the original report, though they are valuable information. Are these exceptions generating crash dumps too?
i have dumps for 1 and 3. that being said i've had enough fun for the day, i'm re-compiling in release mode for now... hopefully this is enough information to go on?
map server does not automatically generate dumps anymore (that I have seen), that may be a separate issue, but i saved the crash dumps in the debugger.
There's the stack traces in the pictures, which helps. it's odd that the auto crash dumps aren't working. It's enough information to at least check around the functions that are listed in the stack traces. Whether or not it's related to the original issue here I don't know.
There's the stack traces in the pictures, which helps. it's odd that the auto crash dumps aren't working. It's enough information to at least check around the functions that are listed in the stack traces. Whether or not it's related to the original issue here I don't know.
i dont know either... no real way to tell
From the server in a hung state (release executable)
So although not a proper fix if you want the process to exit after it encounters a exception you can change in
WheatyExceptionReport.cpp line 304 add another line with
std::terminate();
so it will properly close the process after it dumps the crash info to the log. This still only helps if built in debug. What seems to be happening is Wheaty is taking the exception and because it becomes handled the process remains active instead of dying out like before.
I think we got something this time....... Ole' NotorietyContainer strikes again?
this is still affecting us 3-4 times a day, is there any more information I can provide?
This crash happened on a PC after dismissing trusts and or warping out, i'll also include my dialog with the player in hopes can repro...
Just wanted to update that this is still happening. Whenever the program crashes it will just hang in the log window until forcibly closed (even when debugger is not attached). This prevents auto-restarts from happening. Also no dumps are being generated automatically.
Windows 10
which commit are you on?
which commit are you on? 36a5fe3be1c76881dbf51181c19d3afd7c153ea2
Branch affected by issue
base
Steps to reproduce
I cannot reproduce this issue, at random points in the day or night the map server will just stop working, no debugs no error logs and will not close. These are the only things that are caught in the map server Log. They are completely random while players doing normal things. As far as player count I had 38 people on (5/21/22)
[05/20/22 22:00:54:168][map][info][info] parse: 01A | 0BF3 0BF2 0E from user: Thottietora (parse:672) [05/20/22 22:00:54:168][map][info][action] CLIENT Thottietora PERFORMING ACTION Ranged Attack (0x10) (SmallPacket0x01A:1162) [05/21/22 05:50:55:875][map][info][status] do_init: begin server initialization (do_init:180)
[05/21/22 16:56:04:426][map][info][info] parse: 03A | 0071 0070 04 from user: Arnon (parse:672) [05/21/22 16:56:04:651][map][info][info] parse: 0C0 | 38A0 389D 02 from user: Emerson (parse:672) [05/21/22 17:00:02:605][map][info][status] do_init: begin server initialization (do_init:180)
[05/21/22 17:42:59:015][map][info][info] parse: 01A | 0AF2 0AED 0E from user: Whysotan (parse:672) [05/21/22 17:42:59:015][map][info][action] CLIENT Whysotan PERFORMING ACTION Weaponskill (0x07) (SmallPacket0x01A:1162) [05/21/22 17:43:00:701][map][info][status] do_init: begin server initialization (do_init:180)
[05/22/22 04:14:38:700][map][info][info] parse: 0B5 | 07D1 07CE 16 from user: Terra (parse:672) [05/22/22 04:14:39:107][map][debug][debug] Message: Received message MSG_CHAT_LINKSHELL (3) from message server (message::parse:84) [05/22/22 08:36:01:577][map][info][status] do_init: begin server initialization (do_init:180)
Expected behavior
Not sure if the Expected Behavior is for the map server to hard crash but it just simply is not.