dhewm / dhewm3

dhewm 3 main repository
https://dhewm3.org/
GNU General Public License v3.0
1.8k stars 346 forks source link

Crash on ppc64 big endian #625

Open Doctorj128 opened 1 week ago

Doctorj128 commented 1 week ago

I'm on a Powermac G5 quad 2.5GHz, Radeon HD5770, 16GB RAM with Gentoo Linux on kernel 6.6.52 I got the 'base' folder from my Mac copy of the game, which was installed on my second hard drive.

The game reaches this screen before crashing: Screenshot_2024-10-24_17-18-16

Here's the log: log.txt

DanielGibson commented 1 week ago

I don't have such hardware so I'm afraid you'll have to debug it yourself :-/ As someone using Gentoo on obscure hardware you'll know how to use gdb?

Doctorj128 commented 1 week ago

I think there must be some endianness issues in imgui_draw.cpp

in_grabKeyboard: Will *not* grab the keyboard if mouse is grabbed, so global keyboard-shortcuts (like Alt-Tab) will still work
dhewm3: /home/doctorj1/programs/dhewm3/neo/libs/imgui/imgui_draw.cpp:2598: ImFont* ImFontAtlas::AddFontFromMemoryTTF(void*, int, float, const ImFontConfig*, const ImWchar*): Assertion `font_data_size > 100 && "Incorrect value for font_data_size!"' failed.

Thread 1 "dhewm3" received signal SIGABRT, Aborted.
0x00007ffff74a0fdc in ?? () from /usr/lib64/libc.so.6
(gdb) bt
#0  0x00007ffff74a0fdc in ?? () from /usr/lib64/libc.so.6
#1  0x00007ffff7442564 in raise () from /usr/lib64/libc.so.6
#2  0x00007ffff742623c in abort () from /usr/lib64/libc.so.6
#3  0x00007ffff7438298 in ?? () from /usr/lib64/libc.so.6
#4  0x00007ffff743833c in .__assert_fail () from /usr/lib64/libc.so.6
#5  0x00000001003ab08c in ImFontAtlas::AddFontFromMemoryTTF (this=this@entry=0x104a00cf0, 
    font_data=font_data@entry=0x7ffed7e70010, font_data_size=font_data_size@entry=-266861056, 
    size_pixels=size_pixels@entry=18, font_cfg_template=<optimized out>, glyph_ranges=glyph_ranges@entry=0x0)
    at /home/doctorj1/programs/dhewm3/neo/libs/imgui/imgui_draw.cpp:2598
#6  0x00000001003ab5c0 in ImFontAtlas::AddFontFromMemoryCompressedTTF (this=0x104a00cf0, 
    compressed_ttf_data=compressed_ttf_data@entry=0x10056ccf0 <D3::ImGuiHooks::ProggyVector_compressed_data>, 
    compressed_ttf_size=compressed_ttf_size@entry=198655, size_pixels=18, font_cfg_template=font_cfg_template@entry=0x0, 
    glyph_ranges=glyph_ranges@entry=0x0) at /home/doctorj1/programs/dhewm3/neo/libs/imgui/imgui_draw.cpp:2616
#7  0x0000000100435df8 in D3::ImGuiHooks::NewFrame () at /home/doctorj1/programs/dhewm3/neo/sys/sys_imgui.cpp:337
#8  0x000000010016ccd4 in idCommonLocal::Frame (this=0x1009f1318 <commonLocal>)
    at /home/doctorj1/programs/dhewm3/neo/framework/Common.cpp:2445
#9  0x00000001000855e0 in main (argc=<optimized out>, argv=0x7ffffffff2d8)
    at /home/doctorj1/programs/dhewm3/neo/sys/linux/main.cpp:452
DanielGibson commented 1 week ago

Interesting! Could be that Dear ImGui's compressed font data only works with little endian, I'll try to look into that.

Does dhewm3 work if you disable Dear ImGui (pass -DIMGUI=OFF to cmake)?

Doctorj128 commented 1 week ago

Aha! That worked! It compiles and runs great now, but it looks like event triggers don't work at all. All the NPCs stay in their default A-pose and won't say anything or move. Good progress though!

Screenshot_2024-10-24_23-46-44 Screenshot_2024-10-24_23-45-34 Screenshot_2024-10-24_23-44-47

DanielGibson commented 1 week ago

Great to hear it works better with Dear ImGui. There have been reports of that t-pose problem with big endian before. Someone with such hardware (i.e. not me) needs to debug this..

Can you try if ImGui works with this branch: https://github.com/DanielGibson/dhewm3/tree/imgui-base85 There the font is compressed in a different format that should work with any endianess

DanielGibson commented 1 week ago

see also #472 for the T-Pose issue

Doctorj128 commented 1 week ago

Yeah, looks like that branch works. I'd love to help debug the NPC pose issue, but I don't really know where to start. Let me know if you want me to test anything specific!

Link4Electronics commented 1 week ago

I was about to report this issue. I compiled the branch imgui-base85 and it works fine, despite the T-pose problem. I noticed that during the cutscenes the npcs that have movements they don't have the t-pose, but other npcs on the background that don't make any movement, they stay on their t-pose. Other thing I noticed, it's not possible to open the PDA, it tries to open but closes immediately, it stays bugged trying to open for the rest of the game, but only when looking to the left and up, if look to the right and down, it doesn't try to open the PDA. 20241024_221534

DanielGibson commented 1 week ago

I wish I had an idea where to start debugging this :-/ One thing worth trying might be starting dhewm3 with +set com_forceGenericSIMD 1 arguments, so it doesn't try to use AltiVec. But apart from that I don't really know where to start either

DanielGibson commented 1 week ago

If com_forceGenericSIMD 1 doesn't help, you could try running the game in valgrind to see if it accesses invalid or uninitialized memory. Running it in valgrind can be really slow, so it makes sense to prepare a minimal testcase. So first, without valgrind, try running
./dhewm3 +map testmaps/test_box +spawn marscity_security_goggles_pda Is that security guy in that A-pose (or T-pose or whatever it is)? If not, try spawning something else: Open the console (Shift+Esc) and type spawn marscity_ and press the Tab key to see possible autocompletions. Maybe try marscity_soldier_bald_pda or marscity_civilian1

Once you found someone to spawn who shows the broken behavior, quit dhewm3 and run valgrind ./dhewm3 +map testmaps/test_box +spawn marscity_civilian1 (or whatever model worked for you) Now you may have to wait for several minutes for dhewm3 to start, load that level etc. Eventually you the level should have loaded and the spawned model should be visible. Maybe quickly look around if that's possible and quit dhewm3. Now check if valgrind has written anything interesting to the terminal (the lines will start with ==12345==, if 12345 was the PID). Post those things here (scroll all the way up to where you entered the command to make sure you didn't miss anything)

Link4Electronics commented 5 days ago

Oh hey, sorry for taking too long to answer. I tried that command ./dhewm3 +map testmaps/test_box +spawn marscity_security_goggles_pda The security guard already spawns on the A-pose 20241027_135652 Tried spawning other character models, all with the A-pose. Tried with valgrind and only returned 6309, no 12345 PID.

And don't need to worry about too much if this doesn't work, I understand that powerpc big endian isn't a common platform, it's already a miracle that it compiles and runs 🤣, but thanks for the care anyway, we're just sharing and reporting the issue that maybe or not could narrow the problem, not really demanding or expecting any fix.

A side note, the colors in the game are rendering correctly, reason I mention that is because I compiled many opensource projects, some do render and works fine, some has the color channels swapped, due to big endian uses BGRA space color instead of traditional little endian RGBA, for eg. dhewm3 is rendering correctly the colors, quakespawn doesn't (despite it could be a Mesa3D driver problem). My suspicious is A-pose seems to be related to the physics logic/engine or actor animation? Maybe need a byteswap on an array or vector somewhere.

Best regards, Link.

DanielGibson commented 5 days ago

It doesn't matter if the number is 6309, 12345 was just a placeholder. The question is whether valgrind printed any warnings or errors about uninitialized reads or invalid writes or such.

The weird thing is, dhewm3 works at least with 32bit Big Endian, like MacOS 10.5 on PowerPC, and according to https://github.com/dhewm/dhewm3/issues/472#issuecomment-1289772983 also on 32bit Big Endian PPC with Linux

DanielGibson commented 3 days ago

can you try a debug build of this branch: https://github.com/DanielGibson/dhewm3/tree/PPC64BE-debug ? if you're super-lucky it works (then we only need to figure out which of my hacks and fixes fixed it), if you're a bit less lucky you'll get assertions that might help debugging the problem, if you're unlucky it'll be the same shit as before

if you get an assertion, please reproduce it in gdb and get a backtrace.

Doctorj128 commented 3 days ago

Hi, I've just tried the new branch. Tragically I don't think anything has changed :( I'd love to try and use valgrind to help, but I think that would mean recompiling glibc, which will probably take ages.

DanielGibson commented 3 days ago

Tragically I don't think anything has changed :(

That's a pity. Just to be sure, is it the same for you, @Link4Electronics ?

Another thing to try to hopefully narrow down the problem: run ./dhewm3 +map testmaps/test_box Once the map is loaded, open the console (Shift+Esc) and enter testModel marscity_civilian1. This should spawn that scientist that sits in the hangar right at the beginnig of the game, but in T-pose. Now (still in the console) enter testAnim stand. Now the scientist should be sitting, like this: image What does it do on PPC64?

How does it look like if you then enter r_showSkel 1 in the console?

Doctorj128 commented 2 days ago

The result is exactly the same. He does sit down correctly, and this message is printed: anim 'stand', 4.959 seconds. 120 frames

Results of r_showSkel 1: Screenshot_2024-10-30_10-17-43

Some animations do actually work, such as the characters blinking. It also looks like RaiseWeapon() is being called constantly, multiple times every frame.

Link4Electronics commented 2 days ago

With the branch PPC64BE-debug, got the same behavior, tried to run with valgrind, but when it was about to load testmaps/test_box it crashed. Interesting though that PPC32 doesn't have this behavior, another side note from me, some people had a similar behavior with the project sm64ex, it will compile on PPC64 but after the loading screen it hangs, PPC32 is fine and than by just doing that single change from OP post, s32 word to s64 word, sm64ex works fine on PPC64.

DanielGibson commented 2 days ago

but when it was about to load testmaps/test_box it crashed.

It only crashes when running in valgrind? Or does that branch always crash when loading the map? Does anything get printed when that happens?

by just doing that single change from OP post, s32 word to s64 word, sm64ex works fine on PPC64

I was also looking for similar unions in dhewm3, but the ones I found apparently don't cause the issue, so it must be something else

DanielGibson commented 2 days ago

Before I forget it, something different: We got the ImGui code to stop crashing, but does it actually work? If you open the advanced settings menu by pressing F10 (or, if that doesn't work, entering dhewm3Settings in the console), does it look like expected, i.e. like this: image ?

DanielGibson commented 2 days ago

Oh and yet another thing: One screenshot above shows the console like

WARNING: script/map_marscity1.script(7): Thread 'map_marscity1::main': Entity not found for event 'trigger'. Terminating thread.

Do you also get any warnings when running ./dhewm3 +map testmaps/test_box +spawn marscity_civilian1? (The only warning I get is "WARNING: idAI_marscity_civilian1_40 has no AAS file" which is expected for the test level)

Update: For testing this, ideally use the latest state of https://github.com/DanielGibson/dhewm3/tree/PPC64BE-debug - I just added a commit with additional debug prints and assertions. If the warnings can't be reproduced with the test level, start a new game (do not load a savegame, in case whatever state is broken gets saved!) and post the warnings that get printed when doing that.

DanielGibson commented 2 days ago

I might have a fix, please test the latest state of the aforementioned PPC64BE-debug branch.

Thinking about it again, it's most probably not fixed completely yet, though I think I at least know the cause now.

Please still check Dear ImGui and do the tests for the "Entity not found" warnings

Link4Electronics commented 2 days ago

Sorry it took me a while to answer, here's the valgrind log with ./dhewm3 +map testmaps/test_box +spawn marscity_civilian1. this time it didn't crashed. log.txt from the branch PPC64BE-debug

here's a photo how ImGui is rendering on big endian (it's not a problem from dhewm3 project, unless there's something related), Shipwright project does the same (pressing F1 to open Dear ImGui menu, it's all wrong), 3D Space Cadet somehow it renders ok ImGui on big endian. 20241030_162139

spacecadet (probably using a very old version of ImGui) image I should report this issue to DearImGui project.

DanielGibson commented 2 days ago

Don't worry, I think I know how to fix the Dear ImGui issue, I just wanted to make sure it actually happens before doing the change.

DanielGibson commented 2 days ago

I just pushed a fix for the ImGui color issue, I hope it works..

If you want to tell other projects how to fix it, they just need to add the following to imconfig.h:

// NOTE: D3_IS_BIG_ENDIAN is dhewm3-specific, I set it from CMake
// (it gets passed to the compiler as `-DD3_IS_BIG_ENDIAN=1` or =0 for little endian)
// so you'll need to adjust that line for your project
#if D3_IS_BIG_ENDIAN
  #define IM_COL32_R_SHIFT    24
  #define IM_COL32_G_SHIFT    16
  #define IM_COL32_B_SHIFT    8
  #define IM_COL32_A_SHIFT    0
  #define IM_COL32_A_MASK     0x000000FF
#endif

I'll try to fix the remaining problems with the script code now.

Link4Electronics commented 2 days ago

Yeap, progress! Kudos! 20241030_204225

DanielGibson commented 2 days ago

I just pushed another commit that hopefully fixes the T-pose problem as well :)

Link4Electronics commented 2 days ago

Impressive! Congratulations and happy halloween! xD 20241030_210923 Pda, the entrance scanner scene works, now even NPCs talk!

DanielGibson commented 2 days ago

Now I only need to clean up all that shit and patch the resurrection of evil code as well.. and eventually the mods :-/

DanielGibson commented 2 days ago

The cleaned up code is in this branch: https://github.com/DanielGibson/dhewm3/tree/fix-ppc64be

@Link4Electronics @Doctorj128 could you please test that branch to make sure I got all the important changes, but also please play a bit more in case there are more Big Endian issues that haven't been found yet.

DanielGibson commented 2 days ago

See also #626

Link4Electronics commented 1 day ago

Compiled fix-ppc64be Went further in the game, fps dropped due to the explosion on scene, avg is ~60 fps playable with a R5 230. 20241031_101117

Just for fun, compiling with -maltivec rn... I wonder how could test idVec3