EasyRPG / Player

RPG Maker 2000/2003 and EasyRPG games interpreter
https://easyrpg.org/player/
GNU General Public License v3.0
967 stars 183 forks source link

Player crashes during transitions when color depth is forced to 16bpp #1640

Closed gameblabla closed 1 year ago

gameblabla commented 5 years ago

Name of the game:

Wadanohara...Also got the crash with Pom gets Wifi and i believe any game using screen transitions is affected.

Player platform:

Linux : MIPS32 mipsel (little endian) and ARMel (Armv5). SDL 1.2, sound support was disabled. (Doesn't matter as i can confirm sound support does not affect the crash in any way and works fine)

Attached files

Every time a game fades in (i've tested it with Wadanohara, Pom Gets Wifi), the EasyRPG Player crashes horribly. It is reproduceable and it always crashes at the exact same place. (In the case of Wadanohara, game fades to black in the opening. It's supposed to show the picture for the 1st chapter, the very beginning but it crashes before it even goes there)

Note that i could only reproduce this issue on MIPS32r1 (little endian) and ARMv5 (little endian) SoCs, operating system is Linux (2.6.31, 3.14 for MIPS32, 4.14 for ARMv5) and i'm using SDL 1.2. The same configuration, Makefile on a PC (x86_64) works just fine, including with SDL 1.2. On those platforms though, it crashes. It also crashes on QEMU-MIPS the same way, which allowed me to get a gdb output, i have attached the log as GDB_output.txt and the Makefile as Makefile.txt. (see above in Attached files)

If you need my patched buildroot toolchain for compiling it yourselves on MIPS, you can get the source here https://github.com/rs-97-cfw/buildroot

Let me know if you are interested in running the binary yourself (or need extra info, just provide me the instructions). Note that i am unable to give the SDL 2 backend a try on those platforms as it is unavailable there. (Only framebuffer is available, no hardware acceleration and SDL2 does not support fbcon)

EDIT: It could be a buildroot issue or a GCC issue or an issue with your code. (it's not exactly clear what's wrong with it)

Ghabry commented 5 years ago

Thanks for the report.

How much RAM does the system have? Usually everything with less than 64 MB is problematic for almost any game. Does the crash go away when you enable swap?

If that doesn't help: could you rerun with valgrind? Will run extremely slow but maybe helps finding a memory corruption.

gameblabla commented 5 years ago

Hello Ghabry, the platform i'm running this is the RS-97. It's a MIPS32r1 Ingenic SoC with an hardware FPU, 128MB of DDR2 RAM. It is configured with 256 MB of Swap memory (but i suspect that swap is not working properly....) but stuck on Linux 2.6.31.3. The ARMv5 platform, the New bittboy, is an ARMv5 Soc clocked at 533MHz (no FPU), only 32MB of RAM and configured with 256MB of Swap. (However, swap on this handled is 1bit only and extremely sluggish, thankfully EasyRPG does not seem to be too punishing) The MIPS emulator has 512MB of RAM allocated and is based on the rootfs of the GCW0 and linux 3.14. (My binary however is statically built with GCC 8.2, musl so it does not care for the system libraries)

Also, the crash seemingly happens with and without swap. (The bittboy though, requires swap due to its RAM limitations) The RS-97 is not starving for memory when running EasyRPG.

I should have valgrind available and i will give it a try asap, but i'm not holding my hopes for it...

carstene1ns commented 5 years ago

Will also do a run with my PSP (also MIPS, also LE, also SDL1.2). Last time I tried it, it worked, so we might have broken things in the meanwhile.

gameblabla commented 5 years ago

Unfortunately i can't use Valgrind at the moment on those devices. But i wanted to add that i updated to GCC 8.3 (from 8.2) and it still crashes and doesn't address our issues. I think my settings when i built the toolchain were too aggressive and it could have broke libpixman which EasyRPG relies on. I'mma try that with compiling O0 for pixman (especially since GDB says it's the culprit) and i will let you know. EDIT : This was sadly not sucessful so what i did instead was to use uclibc instead and the results are a bit different. Instead of crashing on malloc like musl-libc would, it seemingly freezes. Honestly though that's not much better...

fmatthew5876 commented 5 years ago

@gameblabla

Instead of using valgrind you can try to build with address sanitizer. This runs at full speed and detects memory corruption bugs.

You need to rebuild from scratch with -fsanitize=address in your build and link flags.

In cmake you can add -DCMAKE_CXX_FLAGS="-fsanitize=address" -DCMAKE_EXE_LINKER_FLAGS="-fsanitize=address". In your makefile you should be able to just add it to CFLAGS and LDFLAGS. I do all my local testing on linux with this enabled.

gameblabla commented 4 years ago

Yo i'm back. Sadly Address sanitizer doesn't seem to be available for the MIPS target... : ( So i had to compile a working build of Valgrind, which i managed to do after a few months or so.

The executable was so huge i had to cut on the dependencies so i could run it somewhat decently (it still crashes as usual though).

My first run with Valgrind gave me this log.txt

Not too useful sadly so i reran it with track origin to yes log_run_2.txt This time however, it became stuck on the unbin/malloc bit sadly and i had to turn off my console.

Doesn't seem like it's very useful sadly...

This was tested against Player commit 96c5e721725dea641ffda7e18d48a127df2448ab and liblcf commit b23556194fef2cdbe3962631b708e58e10072f4c

EDIT Ran it a 3rd time with -s and it did not crash this time, here's the log log_run_3.txt

I also redid it with leak check set to full and show-leak-kinds=all. Unfortunately, it froze before the crash this time again : log_run_4.txt

This is a bit more comprehensible this time, hopefully it helps you guys. Mind you that i had to compile the executable with -Os -g1.

fmatthew5876 commented 4 years ago

Are you sure you're not running out of memory? The original crash is in malloc() so it looks like it could be this.

gameblabla commented 4 years ago

Are you sure you're not running out of memory? The original crash is in malloc() so it looks like it could be this.

You could be right in your assumption because i just tried compiling a new SDL 1.2 build on my RG-350 and it ran just fine ! (that machine has 512MB of DDR2 RAM) But if true, it would mean that transitions would consume more than 128MB of RAM... I did switch back to GCC 7.5 though because GCC 8+ had a lot of issues in my experience. I guess i should try it for the RS-97 again and report back.

fmatthew5876 commented 4 years ago

Very unlikely transitions using that much memory alone.

However to verify, I think you could add a return statement to the beginning of Transition::Init() to completely disable transitions.

gameblabla commented 4 years ago

I should try your hack later for the RS-97/Bittboy. Just wanted to say that EasyRPG works fine on the GKD350H and that thing only has 128MB of DDR2 RAM. (although granted, it has 256MB of swap memory but it runs much better than on the RG-350 so i don't think we are even hitting the barrier there...)

fmatthew5876 commented 4 years ago

We just recently merged a bunch of improvements to transitions which also may have reduced memory usage. You could try again with the continuous build. Let us know how it goes.

gameblabla commented 3 years ago

So i just tried it with GCC 7.5 and latest master commit (i had to use my own Makefile as i have noticed you have recently removed it) and it still crashes on the LDK with its JZ4760B. Same issue, same exact place where it crashes and yadada.

Ghabry commented 3 years ago

based on the valgrind reports there seems to be a problem with the musl support in valgrind. Maybe the malloc is not properly supported so the output is not really helpful. It shows uninitiaized values caused by a heap expansion...

What would be worth a check is using a distribution that uses musl C-Runtime (Alpine Linux) and testing if the Player works there to exclude some musl incompatibility.

gameblabla commented 3 years ago

based on the valgrind reports there seems to be a problem with the musl support in valgrind. Maybe the malloc is not properly supported so the output is not really helpful. It shows uninitiaized values caused by a heap expansion...

What would be worth a check is using a distribution that uses musl C-Runtime (Alpine Linux) and testing if the Player works there to exclude some musl incompatibility.

I haven't tried on x86_64 musl (something like Void linux) but i did try it on the GKD350H as i said earlier and that was with using musl too and it worked there (the architecture there was mips32r2, and not mips32r1. That's basically the only difference). It's possible that there's a regression with GCC 7.x that was fixed on a later GCC release that i had used for the GKD350H (it was GCC 9.2 or 9.3 i believe). Once GCC 10.2 is released, i'll give that a try and see if it fixes stuff or not.

(I realized that i should have said which commit it was when i did the testing... oh well)

carstene1ns commented 3 years ago

@gameblabla can you point me to the toolchain you are using? There are currently a lot of opendingux forks that target different devices unfortunately.

gameblabla commented 3 years ago

@carstene1ns It's this one i use for MIPS32r1 JZ4740/JZ4760 dingux devices : https://github.com/rs-97-cfw/buildroot Note that my toolchain is fully statically linked so that means no support for SDL2 (which runs too slowly on these devices anyway) so you have to use SDL 1.2. There's also a bunch of hacks to compile everything with mno-abicalls and no PIC/PIE. There's also an issue with buildroot & mpg123 : it's not compiled properly. I have to compile it manually with ./configure --host=mipsel-linux --target=mipsel-linux --disable-shared --enable-static --prefix=/opt/rs97-toolchain/mipsel-buildroot-linux-musl/sysroot/usr

(while exporting the bin folder of my toolchain, something like export PATH=/opt/rs97-toolchain/usr/bin:$PATH)

gameblabla commented 3 years ago

Btw, this still works with a custom Makefile and a toolchain that has GCC 7.5 or higher (like mine with SDL 1.2 only : https://github.com/gcw0/buildroot-static or this one https://github.com/WerWolv/rg350_buildroot_gcc10). EasyRPG compiled with such a toolchain works on the GKD350H. (ignoring other regressions unrelated to it, of course)

However, GCC 8+ is known to be troublesome when targeting MIPS32r1 instead, not MIPS32r2. (i encountered multiple crashes in other apps) So i guess this is something that i should try next on my LDK with a JZ4760/MIPS32r1 but at least it still works on MIPS32r2 devices.

gameblabla commented 3 years ago

Ok so some progress !

I know you guys complained about the lack of a recent toolchain for the GCW0/RG-350 so this should fix it.

I should probably run it via valgrind as well but yeah, still crashing : ( I hope the OD team can also port the firmware to the LDK/RS-97 so i can rule out a kernel issue. I should try it on the bittboy again too.

gameblabla commented 3 years ago

I compiled it again for the bittboy/pocket and same exact issue as on the RS-97... It now transitions but it looks glitchy and then it crashes, just like on the LDK/RS-97. The bittboy uses an ARMv5 soc with no FPU while the LDK/RS-97 is a MIPS32r1 SoC with a hardware FPU (that has some bugs but i enabled a compiler workaround to avoid it). Obviously there's something with both in common that causes both to crash as both are using the same versions for the libraries and everything, down to the compiler and libc... I guess that means though, that i will have to use valgrind on it. The pain

But at least it works fine on MIPS32r2 now.

Ghabry commented 3 years ago

Hm maybe the FPU gives a hint? Can you try different softfp options

Slowly sounds like some CPU/compiler bug we hit here

gameblabla commented 3 years ago

There aren't any other softfp options sadly for ARM. We might be hitting a compiler bug indeed (i encountered plenty of those already). Other than doing valgind or GDB again, i'm not sure how to proceed from there.

gameblabla commented 3 years ago

I was able to reproduce the issue on my GCW0 by sheer chance... https://github.com/EasyRPG/Player/blob/master/src/sdl_ui.cpp#L310

This causes the issue. Why ? On the GCW0 with a certain SDL version (the upstream version provided with the toolchain also wouldn't work anymore so took me a while to figure it out and i had to use my statically linked toolchain with GCC 7.5/musl), this would i believe return 32bpp instead of 16bpp. When it is forced to 16bpp, the same glitches that happened on the LDK/Bittboy, also happen on that platform. So setting bpp to 16 will also trigger it.

The RetroFW platforms only support 16bpp for performance reasons and return 16bpp as a result. Bittboy is similar except that it only supports that mode so it also returns 16bpp.

I would need to try it with the newer toolchain just to make sure and force 32bpp on that one but so far it seems to be the culprit, some kind of a buffer overflow.

gameblabla commented 3 years ago

Yup, forcing 32bpp fixes EasyRPG... If you force 16bpp instead, the issue will happen. I just tried it with the upstream toolchain and was able to make it work on that too. However forcing 32bpp on the LDK or Bittboy isn't desirable nor possible on the latter without converting between buffers, which would be a huge waste of performance.

fdelapena commented 3 years ago

Great find, could you confirm if this happens when using 16 bpp in SDL 2.0 if selected ports support it?

gameblabla commented 3 years ago

Great find, could you confirm if this happens when using 16 bpp in SDL 2.0 if selected ports support it?

This doesn't happen with SDL2. Looking at the code, it's not hard to see why : it seems that most of the code assumes a bitdepth of 32 and SDL2 does, by default, init a 32bpp window and there's no easy way to change from that unlike SDL 1.2.

I also attempted to use the 8bpp mode in the SDL 1.2 backend given that the new beta OpenDingux firmware supports 8bpp paletted surfaces (SDL_HWPALETTE | SDL_HWSURFACE) but it seems to just crash for now.

I suspect that the suspect might be Bitmap::Create in SdlUi:RefrsehDisplayMode(). Now we know though that it is not MIPS32 or ARMv5 specific but 16bpp specific.

Ghabry commented 3 years ago

Ooooh, a 16bit platform. Didn't think of this! Great find.

Our 16bit code paths are bit-rotting. Actually you MUST use 32bit for our Player and convert to 16bit for the output buffer. Hard to believe but it is actually FASTER than doing 16bit in Player because 16bit blits are much slower than 32bit blits.

gameblabla commented 3 years ago

Ooooh, a 16bit platform. Didn't think of this! Great find.

Our 16bit code paths are bit-rotting. Actually you MUST use 32bit for our Player and convert to 16bit for the output buffer. Hard to believe but it is actually FASTER than doing 16bit in Player because 16bit blits are much slower than 32bit blits.

Yeah because i would assume that pixman would still do calculations in 32bpp internally before converting it to RGB565. So i guess that doing it in one go with SDL would be faster than that. That said though, i looked bitmap.cpp and to me, it seems like a lot of it could just use SDL directly, as it even has a stretching function (but it's undocumented, SDL_SoftStretch. Doesn't work if the incoming bpp isn't the same as the output but otherwise works faster for 16bpp than SDL_gfx as SDL_gfx lacks a 16bpp path).

Perhaps in the future it would be worth considering targets to use their own bitmap.cpp replacement ? Because rendering is a huge bottleneck on low end platforms like the JZ4760 or JZ4770.... : /

That said though, this feature request doesn't apply to this bug so in the meantime, we need to create a virtual screen surface in case the bpp is not 32

SDL_CreateRGBSurface(SDL_HWSURFACE, width, height, sdl_screen->format->BitsPerPixel, rmask, gmask, bmask, amask);

Or something like that and then do a

SDL_BlitSurface(virtual_scr, NULL, sdl_screen, NULL);

The issue of course, is that the RS-97/LDK toolchain use a hack to force 16bpp seemingly. The hack can be removed but we could always do it manually with something like

static uint16_t rgba888Torgb565(uint32_t s)
{
    unsigned alpha = s >> 27; /* downscale alpha to 5 bits */
    if (alpha == (SDL_ALPHA_OPAQUE >> 3)) return (uint16_t) ((s >> 8 & 0xf800) + (s >> 5 & 0x7e0) + (s >> 3 & 0x1f));
    return s = ((s & 0xfc00) << 11) + (s >> 8 & 0xf800) + (s >> 3 & 0x1f);
}
uint16_t* output;
output = sdl_screen->pixels;
if (SDL_LockSurface(sdl_screen) == 0)
{
    for (uint32_t i = 0; i < (window_width * window_height); i += 1) *output++ = rgba888Torgb565(gfx_output[i]);
    SDL_UnlockSurface(sdl_screen);
}

Or we could use both, but use this loop for RGB565 and SDL_BlitSurface for platforms that aren't broken. I might look into removing the 16bpp hack and see if it fixes things.

EDIT: I'm starting to think it might be a better idea to just let pixman render at 32bpp and convert directly the pixman ARGB surface to RGB565... Though unfortunately it doesn't seem that the code assumes as much. (or maybe it does but it's just slow at doing it ?)

gameblabla commented 3 years ago

Sooo i retried it on the LDK (where it also crashes) and this time, i create a 32bpp buffer internally (thanks to SDL_CreateRGBSurface) and convert it to RGB565 with a simple function. Unfortunately, it also crashes in the same way as it did before, unlike the GCW0 port which crashed if forced to 16bpp...

Btw, i had also reverted the patch to my toolchain that forced it to 16bpp so it couldn't be that either.

Maybe it's something else ? I don't know but it's certainly within the functions that use pixman again... ;/

Ghabry commented 3 years ago

Well way to go would be a openGL bitmap. Especially legacy GL would be useful because the PS Vita supports this but not sure what your devices can do. Only GL ES?

gameblabla commented 3 years ago

Well way to go would be a openGL bitmap. Especially legacy GL would be useful because the PS Vita supports this but not sure what your devices can do. Only GL ES?

The devices that we are talking about (LDK/RG-300/RS-97, Bittboy/PocketGo v1) do not have a dedicated or integrated GPU, and therefore don't support 3D acceleration. You just have access to the screen's framebuffer (RGB565) and that's it.

gameblabla commented 3 years ago

Just wanted to say that i posted my "changes" to the OpenDingux port here : https://github.com/gameblabla/Player/tree/Opendingux_port

Mostly just makefiles and a minor modification to make it run with triple buffering and a hardware surface. It also needs a small soundfont in the gcw-zero/bittoby/rs-97 directory for MIDI playback like this one for example : easyrpg.soundfont.zip

Didn't want to add it to the git because it's quite big, unless you guys know of a smaller one that can work with it.

EDIT: Setting it to 16-bits makes it crash in the same fashion and is also much slower (as you guys stated) while forcing it to 32bpp fixes the crash.

gameblabla commented 3 years ago

Made a quick port to the Funkey-S and forced 32bpp on that one too. All it needed was a simple downscale with SDL_SoftStretch from 320x240 to 240x240.

https://youtu.be/nWxg0cpMIGU

Forcing 32bpp did work on this one. In addition to that, an upstream OpenDingux port is in the works for the LDK/RG-300/RS-97 which should, indirectly, help with this issue as they will properly support 32bpp. Hopefully this will mean that this issue will be, well, less of an issue when this comes out. (although the issue will remain)

gameblabla commented 3 years ago

IMG_20210704_100648

Latest version built from latest source code works fine on the RG-99, although it has several issues including the lack of audio altogether (including sound effects), presumably due to the lack of free RAM. The RG-99 has an Ingenic JZ4725B clocked at 360Mhz (can be overclocked to 456Mhz) and only 32MB of RAM. It has one flaw though : the screen is 320x480 but its IPU doesn't support RGB scaling, requiring double lining or manual scaling. (beta OD fixes that with double lining/DMA)

I also had to disable expat/xml as well as ICU support, due to again, lack of free RAM available. The RG-99 as of now only has 25MB of free ram available after it boots. (it does have 8MB of compressed zram but that's pretty much it)

Any ideas on how to free up some RAM ? EasyRPG's codebase doesn't assume that logging can be disabled.

https://github.com/gameblabla/Player/tree/new_updated

fdelapena commented 3 years ago

Any ideas on how to free up some RAM ? EasyRPG's codebase doesn't assume that logging can be disabled.

If the filesystem I/O is not too bad, try reducing the cache limits, e.g.:

https://github.com/EasyRPG/Player/blob/master/src/cache.cpp#L91 https://github.com/EasyRPG/Player/blob/master/src/audio_secache.cpp#L37

https://github.com/EasyRPG/Player/blob/master/src/audio_secache.h#L72-L73


By the way, if you have issues when pushing with workflows and using 2-factor authentication and/or password tokens, you need to mark the workflow permission checkbox, so you don't need to drop the yml file from your fork to be able to push.

gameblabla commented 3 years ago

Any ideas on how to free up some RAM ? EasyRPG's codebase doesn't assume that logging can be disabled.

If the filesystem I/O is not too bad, try reducing the cache limits, e.g.:

https://github.com/EasyRPG/Player/blob/master/src/cache.cpp#L91 https://github.com/EasyRPG/Player/blob/master/src/audio_secache.cpp#L37

https://github.com/EasyRPG/Player/blob/master/src/audio_secache.h#L72-L73

Those suggestions helped me get EasyRPG work to the point that i could now boot to Wadanora on my RG-99 with FMMidi and some sound effects. I did have to drop some dependencies like samplerate, freetype etc...but unfortunately FMMidi is extremely slow and i can't get fluidlite to work on that platform for some reasons (probably memory related).

gameblabla commented 3 years ago

My new branch is now od_rev1 : https://github.com/gameblabla/Player/tree/od_rev1

For some reasons converting it down to 16bpp did not work before but it does now and without any crashes either. So far, i was able to make EasyRPG work on the RetroFW devices (RS-97, LDK, RG-300), OD Beta on JZ4760 devices, Funkey-S, OpenDingux devices (RG-350, RG-280V, RG-350M) and the Game Kiddy 350H/Mini.

My discussion with you on Discord was useful in making FMMidi and overall everything run much faster than it used to btw. Basically what i did was to force the audio to 16-bits 11025hz and that helped with FMMidi using a lot less CPU usage and reduced the cache size for low memory platforms. Unfortunately, it's hardcoded right now so i had to insert my own macros. I also put back the old midi code as suggested by Ghabry under another define.

The only issue i had was with Fluidlite and the soundfonts : it used to be able to load it from the executable's directory but now, it only does so from the game's directory, requiring me to copy the sf2 there. In the end though, Fluidlite is only used for the Funkey-S as it's slower than FMMidi at 11025hz on other platforms.

As for the RG-99... This is an Ingenic JZ4725B platform with only 32MB of RAM (26Mb of free RAM) and only about 8MB of compressed RAM. It does not attempt to set up a swap partition on the external sd card slot so as a result, it can be challenging to make the games run OK on it. I had to disable the audio resampler as even speexdsp is slower than libsamplerate there. Unfortunately this causes sound effects to sound faster than they should. Other than that, it's also quite slow in some places. Reducing the samplerate of wav files helps and so does using MP3s instead of MIDI. I think i will end up making specific game builds for the RG-99 with per-converted files instead unless someone comes up with a faster resampler. For that reason, i'm also thinking of bringing back the fast wav decoder separately for platforms like the RG-99.

Throw your ideas if you have any, especially when it comes to MIDI and stuff.

As for the actual issue itself, well i guess it's fair to say that EasyRPG doesn't work properly at 16bpp, only 32bpp. Is this something you really want to look into ?

Ghabry commented 3 years ago

Even without the crashes 16bit will likely be always slower with software rendering due to how the "packing" of the bits works. The algorithm is more complex than for 32bit. Only hardware rendering will make 16bit faster.

Could you try it again with fluidsynth instead of fluidlite? AS fluidlite is really old maybe synth had some speed improvements by now.

You can use my "no-glib" branch (disables glib and all thread synchronisation stuff): https://github.com/ghabry/fluidsynth/tree/no-glib

and you need this Player branch:

https://github.com/tyrone-sudeium/Player/tree/native-synthesizers

replace this if here with "true" to get the "fast path" for Midi back that you patched in: https://github.com/tyrone-sudeium/Player/blob/22f0448a40229b31f0213e5b4120fe034563516f/src/audio_decoder_midi.cpp#L289