Closed xycaleth closed 11 years ago
Second diagnosis! It doesn't seem the be the fact that the exception is thrown. I created a small test case for throwing exceptions from and through a shared library and it worked fine in both GCC and Clang (and even when the binary was compiled in GCC, and the shared library with Clang, and vice versa).
I created a really basic cgame library (a C file which exported dllEntry, and vmMain) which just calls trap_Error when vmMain is called. I noticed when stepping through Com_Error, it would sometimes go back a few lines and start executing those lines again, so it seems like the instruction pointer is becoming corrupt at some point. Here's the log just before it crashes:
68 shader files read
3367 shaders found
55193 code lines
0.84 MB shader data
0.018 seconds
-------------------------
Loading dll file ui.
Sys_LoadGameDll(ui) found vmMain function at 0x736ccc0
Loading dll file cgame.
Sys_LoadGameDll(cgame) found vmMain function at 0xd240f80
----- FS_Startup -----
Current search path:
/Users/alex/Library/Application Support/OpenJK/openjk
./openjk
/Users/alex/Library/Application Support/OpenJK/base
./base/assets3.pk3 (16 files)
./base/assets2.pk3 (62 files)
./base/assets1.pk3 (8320 files)
./base/assets0.pk3 (15346 files)
./base
----------------------
23744 files in pk3 files
*******************
ERROR: DERP
********************
----- FS_Startup -----
Current search path:
/Users/alex/Library/Application Support/OpenJK/openjk
./openjk
/Users/alex/Library/Application Support/OpenJK/base
./base/assets3.pk3 (16 files)
./base/assets2.pk3 (62 files)
./base/assets1.pk3 (8320 files)
./base/assets0.pk3 (15346 files)
./base
----------------------
23744 files in pk3 files
You can see the filesystem gets started up twice. I don't see any reason why it would do this just because I have a stub cgame library, so it's more likely that Com_Error has some stack corruption or something.
Guess I should report what I found after more digging. So, exceptions work fine when using clang and gcc across shared library boundaries. I tested this with a limited example where the main binary calls a function in the shared library, which accepts a callback function. The callback function then throws an exception, and this is successfully caught by the program after passing through the library. So I have no idea what's causing the problem in OpenJK itself! Definitely needs more looking into if anyone has time.
Just before the exceptions are thrown in Com_Error, CL_FlushMemory is called which seems to do the filesystem restart. It also shuts down the dlls which might cause a problem with the binary->library->binary exception catching when the library has been closed?
Some of that stuff is likely to change when dealing with game restart fixes.
Commenting out the call to CL_FlushMemory
doesn't make any difference (though this may in itself cause other problems :p) I extended my test program to calling dlclose
before the exception is thrown and this seems to produce the same problem as in OJK so that is a possible cause of the problem.
On Linux and Mac systems (and possibly on Windows, if not build using Visual Studio), when a client tries to connect to a server which has a map that client doesn't have, the program will crash and close completely. I've pin-pointed the problem down to the exception being thrown in Com_Error. The problem is that C++ exceptions are not guaranteed to work when there are shared libraries (.dll/.so/.dylib files) in the program stack. In Q3, Com_Error uses
setjmp
andlongjmp
, to jump immediately return to the main menu. Using these functions is not a good idea with JK2/JKA because the game uses C++ classes, and the destructors of these objects need to be called otherwise you may end up with memory leaks.longjmp
completely bypasses all the stacks until it returns to the point whensetjmp
was first called, whereas C++ exceptions will unwind the stack properly, calling all the necessary class destructors.How do we want to fix this problem? I don't have any ideas as it is now :/