JACoders / OpenJK

Community effort to maintain and improve Jedi Academy (SP & MP) + Jedi Outcast (SP only) released by Raven Software
GNU General Public License v2.0
2.03k stars 614 forks source link

Client crashes after Com_Error is invoked by cgame/game/ui #254

Closed xycaleth closed 11 years ago

xycaleth commented 11 years ago

On Linux and Mac systems (and possibly on Windows, if not build using Visual Studio), when a client tries to connect to a server which has a map that client doesn't have, the program will crash and close completely. I've pin-pointed the problem down to the exception being thrown in Com_Error. The problem is that C++ exceptions are not guaranteed to work when there are shared libraries (.dll/.so/.dylib files) in the program stack. In Q3, Com_Error uses setjmp and longjmp, to jump immediately return to the main menu. Using these functions is not a good idea with JK2/JKA because the game uses C++ classes, and the destructors of these objects need to be called otherwise you may end up with memory leaks. longjmp completely bypasses all the stacks until it returns to the point when setjmp was first called, whereas C++ exceptions will unwind the stack properly, calling all the necessary class destructors.

How do we want to fix this problem? I don't have any ideas as it is now :/

xycaleth commented 11 years ago

Second diagnosis! It doesn't seem the be the fact that the exception is thrown. I created a small test case for throwing exceptions from and through a shared library and it worked fine in both GCC and Clang (and even when the binary was compiled in GCC, and the shared library with Clang, and vice versa).

I created a really basic cgame library (a C file which exported dllEntry, and vmMain) which just calls trap_Error when vmMain is called. I noticed when stepping through Com_Error, it would sometimes go back a few lines and start executing those lines again, so it seems like the instruction pointer is becoming corrupt at some point. Here's the log just before it crashes:

68 shader files read 
3367 shaders found
55193 code lines
0.84 MB shader data
0.018 seconds
-------------------------
Loading dll file ui.
Sys_LoadGameDll(ui) found vmMain function at 0x736ccc0
Loading dll file cgame.
Sys_LoadGameDll(cgame) found vmMain function at 0xd240f80
----- FS_Startup -----
Current search path:
/Users/alex/Library/Application Support/OpenJK/openjk
./openjk
/Users/alex/Library/Application Support/OpenJK/base
./base/assets3.pk3 (16 files)
./base/assets2.pk3 (62 files)
./base/assets1.pk3 (8320 files)
./base/assets0.pk3 (15346 files)
./base

----------------------
23744 files in pk3 files
*******************
ERROR: DERP
********************
----- FS_Startup -----
Current search path:
/Users/alex/Library/Application Support/OpenJK/openjk
./openjk
/Users/alex/Library/Application Support/OpenJK/base
./base/assets3.pk3 (16 files)
./base/assets2.pk3 (62 files)
./base/assets1.pk3 (8320 files)
./base/assets0.pk3 (15346 files)
./base

----------------------
23744 files in pk3 files

You can see the filesystem gets started up twice. I don't see any reason why it would do this just because I have a stub cgame library, so it's more likely that Com_Error has some stack corruption or something.

xycaleth commented 11 years ago

Guess I should report what I found after more digging. So, exceptions work fine when using clang and gcc across shared library boundaries. I tested this with a limited example where the main binary calls a function in the shared library, which accepts a callback function. The callback function then throws an exception, and this is successfully caught by the program after passing through the library. So I have no idea what's causing the problem in OpenJK itself! Definitely needs more looking into if anyone has time.

redsaurus commented 11 years ago

Just before the exceptions are thrown in Com_Error, CL_FlushMemory is called which seems to do the filesystem restart. It also shuts down the dlls which might cause a problem with the binary->library->binary exception catching when the library has been closed?

ensiform commented 11 years ago

Some of that stuff is likely to change when dealing with game restart fixes.

xycaleth commented 11 years ago

Commenting out the call to CL_FlushMemory doesn't make any difference (though this may in itself cause other problems :p) I extended my test program to calling dlclose before the exception is thrown and this seems to produce the same problem as in OJK so that is a possible cause of the problem.