JACoders / OpenJK

Community effort to maintain and improve Jedi Academy (SP & MP) + Jedi Outcast (SP only) released by Raven Software
GNU General Public License v2.0
2.03k stars 614 forks source link

Loading a game after dying cause some thing to stop working. #933

Closed jamie-marchant closed 6 years ago

jamie-marchant commented 7 years ago

Reporting a bug? Please make sure you've given the following information - thanks!

Operating system and version: Fedor Core 26(64-bit), game version: OpenJK 1.0.1.1(64-bit executable)

Is this for single player or multiplayer? Single player, Jedi Night Academy

Description of the bug (and if possible, steps to reproduce the bug):

  1. Save your game(I have tried checkpoints and quicksaves but not manual saves)
  2. Get killed.
  3. Load the save, cutscenes will not play or will freeze, most elevators will not work and pulling switches will have no effect. Enemy AI will still be loaded, I am not sure about bosses. Most other things still work. NOTE: When these things stop working, saving your game and restarting the whole game, can make them work. Sometimes you can control your character or a clone of them during a cutscene but other than that they do work as normal.

What did you expect to happen instead? Everything functions after loading a save game after death.

simbalion commented 7 years ago

This also happens in Debian 9, using the July 17th build

ensiform commented 7 years ago

@bibendovsky any thoughts? Caused by your changes or...?

bibendovsky commented 7 years ago

Can't reproduce. Tried the first level (yavin1b) on Windows x86/x64, Ubuntu x64 and Fedora x64. Used 71325abca0f5371d7745b69decad9ae2e88cc716 for tests.

jamie-marchant commented 7 years ago

@bibendovsky did you use a newer build? Maybe it's been fixed now?

bibendovsky commented 7 years ago

@jamie-marchant I used latest source code to build binaries.

jamie-marchant commented 7 years ago

@bibendovsky so you're saying if we use 'Merge pull request #799 from deathsythe47/master' it will fix the problem? Forgive I'm new to this GitHub stuff.

ensiform commented 7 years ago

@jamie-marchant OpenJK is already at this commit. But this one in particular didn't affect your problem, whatever was last week's build should be sufficient, if not tomorrows shouldn't be any different as far as SP Saves are concerned.

jamie-marchant commented 7 years ago

Uh, I see, I still have an issue on the build from last week. The question is why is "bibendovsky" not having this issue. Odd as it seems could be it application bit type? I am using 64-bit.

ensiform commented 7 years ago

As you see above, @bibendovsky is using 64-bit as well.

jamie-marchant commented 7 years ago

Yes the OS is 64-bit but what about the game?

ensiform commented 7 years ago

I figure the game is implied to match.

jamie-marchant commented 7 years ago

If that is the case I don't know why it can't be reproduced.

bibendovsky commented 7 years ago

@jamie-marchant Do you have issues on the first level (yavin1b) too?

Steps I tried to reproduce the issue:

At this point, if ICARUS has empty state you, for example, can't cut the tree for Rosh, trigger a cutscene before a first door (after killing some Howlers), etc.

jamie-marchant commented 7 years ago

Yes I do have this problem on the first level.

bibendovsky commented 7 years ago

Maybe it's GCC issue? Build compiled with GCC 4.7.4 (Ubuntu x64) also does not have such problem. I don't have GCC 4.7.2 (like used in https://builds.openjk.org/) to test.

bibendovsky commented 7 years ago

Noticed that log https://jk.xd.cm/builders/linux-64/builds/858/steps/make/logs/stdio is quite small. Maybe forcing full rebuild can resolve the problem?

jamie-marchant commented 7 years ago

This is the log from your weekly build? How easily can you make a full rebuild?

Would it help if I posted a game log? I can consistently reproduce this bug.

bibendovsky commented 7 years ago

@jamie-marchant I don't know what is "weekly build". All tests done with builds compiled by myself.

jamie-marchant commented 7 years ago

By weekly builds I mean the ones here: https://builds.openjk.org/,(I thought they where built weekly) those are the ones I use. If you are building your own, where does that log file come from? I don't remember getting a copy to compile myself.

ensiform commented 7 years ago

That log file came from the weekly build.

jamie-marchant commented 7 years ago

Uh, I see. Thanks for explaining that.

UPDATE: I've updated to Fedora 26 but still have the problem.

dmitrio95 commented 7 years ago

Hello everyone!

I have experienced this bug, and have done some testing on it recently. So let me describe the problem as I see it.

I tested builds from builds.openjk.org, as well as builds made by myself on Debian 9 (with gcc 6.3) and on virtual machine running Debian 7 (with gcc 4.7 and pulled from Debian 8 repos gcc-4.9). All builds were tested on my primary Debian 9 system.

The bug did not appear in my Debian 9 build, but did appear in builds from builds.openjk.org as well as in builds from Debian 7 virtual machine with both versions of compiler. Later testings revealed the following problem.

The game code (at least its cgame module) contains some global variables that store some game state related parameters. These variables need to be reinitialized on each level change (and, of course, on each savegame load). This is done via reloading the whole game engine dynamic library, jagamex86_64.so. The problem of those "bugged" builds is that this library, for some reasons, didn't get unloaded (even though the corresponding dlclose() were actually called) and, hence, didn't get reinitialized. This led to some problems with the game behavior.

A brief google search on this problem gave an idea that this may be a result of usage of STB_GNU_UNIQUE symbols by gcc in the corresponding library (see this StackOverflow answer). It can be switched off by -fno-gnu-unique gcc option, and indeed, when I built the game with this compiler option on my Debian 7 virtual machine with gcc-4.9, I got binaries with the library unloading working properly and, hence, without the bug. However, this option seems to be absent in gcc-4.7, so it will be useless for the build server in its current state. Moreover, this means that we cannot use this method to guarantee a proper game build on some old systems. But, as for now, I can not suggest any other options to fix this. Maybe you can see any better options.

Moreover, I still can not explain why this bug appears only on some systems, as the gnu-unique option seems to be turned on by default in all gcc builds I have used for my tests.

ensiform commented 7 years ago

That's an interesting find. I should point out that the cgame isn't even a separate module with SP. It's part of the same shared object you mentioned but has a separate initialization that happens as though it was it's own module.

Seems relevant: https://stackoverflow.com/questions/8792363/c-dlclose-doesnt-unload-the-shared-library

jamie-marchant commented 7 years ago

Well, done dmitrio95! We now know what the problem is! So the question is can the GCC be upgraded on the server? Does anyone else have any alternative ideas? I don't know enough about GCC, sorry.

ensiform commented 7 years ago

Upgrading to newer gcc sounds like putting a bandaid on instead of fixing the issue.

jamie-marchant commented 7 years ago

Uh I see.

xycaleth commented 7 years ago

Thanks for the in-depth investigation @dmitrio95.

This problem has come up before a long time ago, and e8fc6378 was my original fix for the problem.

It got broken a while later when the compiler flags I added were removed.. What's interesting though is the commit I referenced in my commit message: ioquake/jedi-academy@d6bc68d which says in its commit message:

Fixed dclose problems (stuck in level 4)

  • UNIQUE symbols in jk2gamex86.so don't allow to unload the library on map changes!

From what you're saying though @dmitrio95 it seems the dclose problem isn't actually a bug, but an unexpected behaviour which still complies with the POSIX standard.

I don't have Linux available right now - what does running readelf -Ws ./jagamex86.so | grep UNIQUE output currently? (or for which ever library had the problem)

dmitrio95 commented 7 years ago

Building the game with gcc 4.7 with -fvisibility=hidden flag seems to produce binaries without that bug. Why were those flags removed?

As for checking the library with readelf -Ws ./jagamex86.so | grep UNIQUE, this command produced no output on builds that worked in my previous tests, and "broken" builds yielded the following output:

~/Public/OpenJK/install/JediAcademy/OpenJK $ readelf -Ws jagamex86_64.so | grep UNIQUE
   114: 0000000000000000     0 OBJECT  UNIQUE DEFAULT  UND _ZNSs4_Rep20_S_empty_rep_storageE@GLIBCXX_3.4 (3)
   665: 0000000000d47640     8 OBJECT  UNIQUE DEFAULT   26 _ZGVZNK11CGPProperty11GetTopValueEvE5empty
  2002: 0000000000d3d0e0    24 OBJECT  UNIQUE DEFAULT   26 _ZZN7CBezier11DrawSegmentEPfS0_ffE7lastEnd
  2500: 0000000000d47650    16 OBJECT  UNIQUE DEFAULT   26 _ZZNK11CGPProperty11GetTopValueEvE5empty
  3924: 0000000000d47650    16 OBJECT  UNIQUE DEFAULT   26 _ZZNK11CGPProperty11GetTopValueEvE5empty
  4322: 0000000000d47640     8 OBJECT  UNIQUE DEFAULT   26 _ZGVZNK11CGPProperty11GetTopValueEvE5empty
  4715: 0000000000000000     0 OBJECT  UNIQUE DEFAULT  UND _ZNSs4_Rep20_S_empty_rep_storageE@@GLIBCXX_3.4
  4752: 0000000000d3d0e0    24 OBJECT  UNIQUE DEFAULT   26 _ZZN7CBezier11DrawSegmentEPfS0_ffE7lastEnd

It is interesting that for the build done with -fvisibility=hidden this command found some UNIQUE symbols, but this didn't stop the library from being unloaded just in time. Perhaps objects from libstdc++ are used.

~/Public/OpenJK/install_vis_hidden/JediAcademy $ readelf -Ws OpenJK/jagamex86_64.so | grep UNIQUE         
   123: 0000000000000000     0 OBJECT  UNIQUE DEFAULT  UND _ZNSs4_Rep20_S_empty_rep_storageE@GLIBCXX_3.4 (3)
  5507: 0000000000000000     0 OBJECT  UNIQUE DEFAULT  UND _ZNSs4_Rep20_S_empty_rep_storageE@@GLIBCXX_3.4
TheRuleOfMike commented 6 years ago

Tried two builds recently (openjk-2017-12-11-eba428e6-linux-64 and openjk-2017-12-25-ae3900fc-linux-64) on Kubuntu 17.04 64bit and this bug still persists. Reproducing it is easy - just die and load a game anywhere, even on Yavin_1. Some things like scripted events and objects that can be moved by the force that are necessary to progress just stop working so, unfortunately, this is a game breaking bug (for example, happened with the second tree that needs to be cut down on Yavin and the crane at the Java sandcrawler).

After reading the above thread, there seems to be a bunch of solutions to the problem. Could you guys incorporate them in the never builds? This issue seems to be ignored for a long time now.

dmitrio95 commented 6 years ago

Well, it was an interesting discussion, but the cause of the problem seems to be simpler.

As @xycaleth mentioned, this problem was fixed a long time ago with the addition of -fvisibility=hidden project-global compiler option. Later this was a bit changed (bace05c9b3da11a92d7f14ac53a69f06a7015804), and this compiler option is now set up on a per-target basis rather than project-wide. The problem comes from the way it is done: for every build target that needs this flag to be set up it is appended to its COMPILE_OPTIONS property :

set_property(TARGET ${SPGame} APPEND PROPERTY COMPILE_OPTIONS ${OPENJK_VISIBILITY_FLAGS})

However, CMake v. 2.8.9, which is declared to be minimal supported CMake version in CMakeLists.txt, does not support this property, so, when using this version of CMake, the symbol visibility flag does not get added to the actual compiler options list. Debian 7, which is running on the OpenJK build server, contains exactly this version of CMake, so we can now experience this problem.

So, I guess, we have three options here:

  1. Raise CMake minimal required version to 2.8.12 (seems to be minimal version supporting that COMPILE_OPTIONS targets property). This will require update of CMake on build server.
  2. Use supported in CMake 2.8.9 property COMPILE_FLAGS instead of COMPILE_OPTIONS. This property, however, is deprecated, so it is not the best option here.
  3. Use some other way to set this visibility flag up — which may, however, violate the idea behind the change that introduced, by accident, this issue.
xycaleth commented 6 years ago

Should be fixed by #972