gonetz / GLideN64

A new generation, open-source graphics plugin for N64 emulators.
Other
770 stars 177 forks source link

Indiana Jones crash #2628

Closed fzurita closed 2 years ago

fzurita commented 2 years ago

I'm often getting a crash report when Indiana Jones is running, I don't have a reproducible case, but I think it's good to document what we have so far. The crash is a segmentation fault and it happens here:

https://github.com/gonetz/GLideN64/blob/8343fd05c9a01ce2d73f7e32bb8355036759a05b/src/uCodes/F5Indi_Naboo.cpp#L213

Stack trace:

********** Crash dump: **********
#00 0x000000000019801e     lib/arm/libmupen64plus-video-GLideN64.so
    F5INDI_Lighting_Overlay1(unsigned int, unsigned int)
    ./src/uCodes/F5Indi_Naboo.cpp:213:19
    F5INDI_MoveMem(unsigned int, unsigned int)
    ./src/uCodes/F5Indi_Naboo.cpp:577:6
#00 0x0000000000130e39     lib/arm/libmupen64plus-video-GLideN64.so (RSP_ProcessDList()+704)
    _ProcessDListFactor5()
    ./src/RSP.cpp:92:3
    RSP_ProcessDList()
    ./src/RSP.cpp:179:3
#00 0x0000000000006a60     lib/arm/libmupen64plus-rsp-hle.so
#00 0x0000000000009e58     lib/arm/libmupen64plus-rsp-hle.so (DoRspCycles+3760)
#00 0x0000000000091c64     lib/arm/libmupen64plus-core.so
#00 0x00000000000644d0     lib/arm/libmupen64plus-core.so
#00 0x000000000008b95c     lib/arm/libmupen64plus-core.so
#00 0x00000000000b11ca [anon:.bss]
fzurita commented 2 years ago

DMEM size is this is the mupen64plus core:

enum { SP_MEM_SIZE = 0x2000 };

So we could probably just bound the pointer to that size. Not sure what to do when this happens though? Just ignore the command?

fzurita commented 2 years ago

@loganmc10 You may know this, how should out of bounds reads be handled in the real N64? What should they return?

loganmc10 commented 2 years ago

As far as I remember, if it's a DMA type access, the out of bounds portion would return all 0's. If some of the address space was valid, that data would be returned, just the out of bounds portion would be 0.

If it's CPU access (like a word or dword), then it does this odd thing based on the address being accessed, see https://github.com/mupen64plus/mupen64plus-core/blob/6860d134afd16f40262fe335cfee04967265fbac/src/device/device.c#L39

I'm not 100% sure how this would translate to RDP memory access, I assume it would be safe to just return 0's

fzurita commented 2 years ago

I can't think of a good general solution that would be easy to implement. We seem to get a pointer to DMEM and increment it at will in many places.

I could create a DMEM pointer wrapper that checks for out of bounds access on every dereference operation done to the pointer, but that would still mean that every time we get a DMEM pointer, we have to replace the pointer with the wrapped pointer.

It may just be better to handle this specific situation, since it's the only crash I have been seeing. So I could just add a very localized test in that one specific function with issues.

gonetz commented 2 years ago

I could create a DMEM pointer wrapper that checks for out of bounds access on every dereference operation done to the pointer, but that would still mean that every time we get a DMEM pointer, we have to replace the pointer with the wrapped pointer.

I often had similar crashes when I worked on that ucode. Since it was hard to debug, I implemented similar approach - a wrapper with bounds check. I removed it when the game started to work stably. This crash can be caused by a bug in game's code, but most likely it is a bug in ucode implementation. It would be more easy to find that bug If we had a save state to debug. @fzurita are all crash reports point on this line (F5Indi_Naboo.cpp, line 213)? If yes, the problem is probably not so hard to localize.

fzurita commented 2 years ago

Yeah, all the crashes are exactly the same.

Also, I don't have a save state, it's hard to get more than stack traces since those happen automatically in Google play. I have no way to get in touch with the user.

gonetz commented 2 years ago

This code is used in Volcano, Palawan Temple, Teotihuacan levels, may be in other levels with dark caves. I have no issues in the beginning of these levels. I don't know how to play this game, so I can't go through the level. The code also is used when Indiana uses his lighter, in any place. So, the problem actually can be anywhere.

fzurita commented 2 years ago

I'm not getting the automated crash report for this, so I believe it worked. Nobody has gotten in touch with me about the issue yet though.

I'll go ahead and close this issue since it's fixed and I'll create a new issue once I have some kind of save state.

fzurita commented 2 years ago

Although, I just got clang address sanitizer setup. It's finding a few places where we are reading out of bounds even at startup.

gonetz commented 2 years ago

Although, I just got clang address sanitizer setup. It's finding a few places where we are reading out of bounds even at startup.

reading out of bounds from DMEM?

fzurita commented 2 years ago

I made a pull request. It turns out that all the issues ASAN found were unrelated to Indiana Jones.

None were for reading DMEM out of bounds.

gonetz commented 2 years ago

None were for reading DMEM out of bounds.

You probably will never detect out of bounds issues with DMEM, because mupen64plus has DMEM size doubled.