Aleph-One-Marathon / alephone

Aleph One is the open source continuation of Bungie’s Marathon 2 game engine.
https://alephone.lhowon.org/
GNU General Public License v3.0
644 stars 100 forks source link

Crashing on Linux due to memory corruption #21

Closed SweetGale closed 3 years ago

SweetGale commented 8 years ago

Hardware:

Operating System:

I've tried building and running Aleph One on the two Raspberry Pis listed above but am experiencing various memory-related crashes once I enter the actual game (i.e. pressing "Begin New Game" and clicking past the Chapter Screen). Most of the time it crashes immediately but sometimes I get to play for a few seconds, enough to see the BOBs slaughter all of the Pfhor at the beginning of Waterloo Waterpark.

I'm using the latest tarball from http://alephone.lhowon.org (20150620). The latest code from GitHub crashes immediately. I can get some output from that version as well if you want me to though.

Some sample error messages:

*** Error in `alephone': free(): invalid next size (fast): 0x02388568 ***
Avbruten (SIGABRT)

*** Error in `alephone': malloc(): memory corruption: 0x021fa360 ***
Avbruten (SIGABRT)

*** Error in `alephone': corrupted double-linked list: 0x02350108 ***
Avbruten (SIGABRT)

Output from GDB:

pi@raspberrypi ~ $ gdb --args /usr/local/bin/alephone --windowed "/usr/local/share/AlephOne/Marathon 2"
GNU gdb (Raspbian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/alephone...done.
(gdb) handle SIGILL pass nostop noprint
Signal        Stop  Print   Pass to program Description
SIGILL        No    No  Yes     Illegal instruction
(gdb) run
Starting program: /usr/local/bin/alephone --windowed /usr/local/share/AlephOne/Marathon\ 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Aleph One Linux 2015-06-20 1.2.1
http://marathon.sourceforge.net/

Original code by Bungie Software <http://www.bungie.com/>
Additional work by Loren Petrich, Chris Pruett, Rhys Hill et al.
TCP/IP networking by Woody Zenfell
Expat XML library by James Clark
SDL port by Christian Bauer <Christian.Bauer@uni-mainz.de>

This is free software with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
For details, see the file COPYING.

Built with network play enabled.

Built with Lua scripting enabled.
[New Thread 0x73b603b0 (LWP 18463)]
*** Error in `/usr/local/bin/alephone': corrupted double-linked list: 0x00edcd08 ***

Program received signal SIGABRT, Aborted.
0x759def70 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x759def70 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x759e0324 in __GI_abort () at abort.c:89
#2  0x75a1a954 in __libc_message (do_abort=<optimized out>, 
    fmt=0x75ad0bc0 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x75a20b80 in malloc_printerr (action=1, 
    str=0x75ad0c04 "corrupted double-linked list", ptr=<optimized out>)
    at malloc.c:4996
#4  0x75a2202c in _int_free (av=0x75aee4d4 <main_arena>, p=<optimized out>, 
    have_lock=12690296) at malloc.c:3996
#5  0x00139fa4 in l_alloc (ud=<optimized out>, ptr=<optimized out>, 
    osize=<optimized out>, nsize=0) at lauxlib.c:922
#6  0x001437d0 in luaM_realloc_ (L=L@entry=0x8c81d0, block=0xedccc8, osize=64, 
    nsize=nsize@entry=0) at lmem.c:84
#7  0x0014a718 in luaH_free (L=0x8c81d0, L@entry=0x140490 <sweeplist+656>, 
    t=t@entry=0xedfee8) at ltable.c:381
#8  0x00140490 in freeobj (o=0xedfee8, L=<optimized out>) at lgc.c:668
#9  sweeplist (L=<optimized out>, p=0x8c1010, count=23, count@entry=80)
    at lgc.c:732
#10 0x001414a0 in singlestep (L=L@entry=0x8c81d0) at lgc.c:1082
#11 0x00141d4c in incstep (L=0x8c81d0) at lgc.c:1141
#12 luaC_forcestep (L=0x8c81d0) at lgc.c:1160
---Type <return> to continue, or q <return> to quit---
#13 0x00141ea8 in luaC_step (L=L@entry=0x8c81d0) at lgc.c:1172
#14 0x0014dd84 in luaV_execute (L=L@entry=0x8c81d0) at lvm.c:840
#15 0x0013f27c in luaD_call (L=0x8c81d0, func=<optimized out>, 
    nResults=<optimized out>, allowyield=<optimized out>) at ldo.c:395
#16 0x0013e928 in luaD_rawrunprotected (L=L@entry=0x8c81d0, f=0x8c81d0, 
    f@entry=0x1380c4 <f_call>, ud=0x0, ud@entry=0x7effef30) at ldo.c:131
#17 0x0013f498 in luaD_pcall (L=L@entry=0x8c81d0, 
    func=func@entry=0x1380c4 <f_call>, u=u@entry=0x7effef30, old_top=16, 
    ef=ef@entry=0) at ldo.c:595
#18 0x001395fc in lua_pcallk (L=0x8c81d0, nargs=<optimized out>, nresults=0, 
    errfunc=<optimized out>, ctx=0, k=0x0) at lapi.c:949
#19 0x00194aa8 in LuaHUDState::CallTrigger (this=0x65fc20, 
    numArgs=<optimized out>) at lua_hud_script.cpp:176
#20 0x0028f924 in Lua_DrawHUD (time_elapsed=<optimized out>)
    at HUDRenderer_Lua.cpp:60
#21 0x002a47e0 in render_screen (ticks_elapsed=ticks_elapsed@entry=1)
    at screen.cpp:1268
#22 0x001a79c8 in idle_game_state (time=<optimized out>) at interface.cpp:1238
#23 0x0001ec74 in main_event_loop () at shell.cpp:896
#24 main (argc=<optimized out>, argv=<optimized out>) at shell.cpp:376

The "handle SIGILL pass nostop noprint" part is required or else Aleph One crashes immediately due to "Program received signal SIGILL, Illegal instruction".

A possibly related bug report over at SourceForge: Alephone crash ppc linux

Thread for this issue over on the Pfhorums: Aleph One on Raspberry Pi

Hopper262 commented 8 years ago

I don't have a Raspberry Pi, but here are some tips for saving memory in Aleph One:

Best of luck to you! Feel free to document your progress here or on the Pfhorums.

SweetGale commented 8 years ago

I should probably have told what settings I was using. I've done most of my testing on software mode, 1280×800, 16 bit. It seemed to run decently on both machines at those settings.

I don't think this issue is due to lack of memory, but I still tried out your suggestion to disable plugins. It appears the Enhanced HUD was the culprit. It uses Lua and the crash was in the Lua code, so there we have it. Disable it and the game runs just fine.

I also tried OpenGL by the way and it was horrendously slow. Completely unplayable on even the lowest settings and with HD textures disabled.

The next step is to see if I can get the latest code from GitHub to run. I'm also curious to see what kind of performance I'll get from the Raspberry Pi 3 Model B that was released today.

SweetGale commented 8 years ago

I took a stab at running the latest code from GitHub.

autogen.sh listed some additional SDL 2 dependencies not required by the tarball and not listed on the wiki, namely libsdl2-dev, libsdl2-ttf-dev and libsdl2-net-dev.

Aleph One starts, prints the usual credits and then exits with status 1 and no error message. The log file contains one entry: an unhandled exception.

Unhandled exception: basic_string::_S_construct null not valid (shell.cpp:384)

I'd expect the program to then call the exit function on line 389, but when I load it up in GDB and set a breakpoint on exit it instead breaks on line 425.

So it seems to me that SDL initialization failed and that _SDLGetError() returned null which is why I got an exception instead of an error message.

pi@raspberrypi ~/Projekt/alephone $ gdb --args /usr/local/bin/alephone --windowed "/usr/local/share/AlephOne/Marathon 2"
GNU gdb (Raspbian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/alephone...done.
(gdb) handle SIGILL pass nostop noprint
Signal        Stop  Print   Pass to program Description
SIGILL        No    No  Yes     Illegal instruction
(gdb) sta
Temporary breakpoint 1 at 0x1efa4: file shell.cpp, line 288.
Starting program: /usr/local/bin/alephone --windowed /usr/local/share/AlephOne/Marathon\ 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Temporary breakpoint 1, main (argc=3, argv=0x7efff504) at shell.cpp:288
288 {
(gdb) break exit
Breakpoint 2 at 0x75972b28: file exit.c, line 104.
(gdb) break _exit
Breakpoint 3 at 0x759e17a0: _exit. (2 locations)
(gdb) c
Continuing.
Aleph One Linux 2015-09-07 1.3a1
https://alephone.lhowon.org/

Original code by Bungie Software <http://www.bungie.com/>
Additional work by Loren Petrich, Chris Pruett, Rhys Hill et al.
TCP/IP networking by Woody Zenfell
Expat XML library by James Clark
SDL port by Christian Bauer <Christian.Bauer@uni-mainz.de>

This is free software with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
For details, see the file COPYING.

Built with network play enabled.

Built with Lua scripting enabled.

Breakpoint 2, __GI_exit (status=1) at exit.c:104
104 exit.c: Filen eller katalogen finns inte.
(gdb) bt
#0  __GI_exit (status=1) at exit.c:104
#1  0x0001f620 in initialize_application () at shell.cpp:425
#2  main (argc=<optimized out>, argv=<optimized out>) at shell.cpp:368
Hopper262 commented 8 years ago

Commit f750664 should avoid the exception when SDL_GetError() isn't working properly, but that doesn't change the fact that SDL isn't starting up properly.

This article contains a test SDL2 program; can you see if that runs? In addition to what you've already installed, it suggests installing libsdl2-image-dev and libsdl2-mixer-dev. Aleph One uses sdl2-image if present, but it should be optional, and Aleph One shouldn't care about the mixer part. However, package dependency errors do happen, so if the test program runs it's worth trying Aleph One again with those added packages, just in case.

Commit d724d36 was the last commit before SDL 2, so if that library is a sticking point and you feel like changing gears, you can roll back to that commit to work on whatever broke between June and February. Most of those changes should be unaffected by SDL 2, so I am interested if you find Raspbian crashes on that snapshot.

SweetGale commented 8 years ago

Installed additional SDL2 packages libsdl2-image-dev and libsdl2-mixer-dev. Compiled and ran SDL test program. Worked flawlessly. Pulled latest commit f750664. Re-ran autogen.sh and make.

./autogen.sh CFLAGS="-g -O0" CXXFLAGS="-g -O0" --with-boost-libdir=/usr/lib/arm-linux-gnueabihf

Aleph One still exits due to the same exception. Loading it up in GDB shows that it manages to get past the SDL initialization step.

I still end up with the same type of exception. What makes things so difficult is that I can't get a backtrace for where the exception was thrown. By carefully stepping through the program I was finally able to locate the exact line that causes the exception.

get_name_from_system () at preferences.cpp:187
187     std::string login = getlogin();

Checking the return value confirms that getlogin returns null which in turn causes the exception since it cannot be converted to a string.

(gdb) finish
Run till exit from #0  getlogin () at ../sysdeps/unix/sysv/linux/getlogin.c:39
0x0025d224 in get_name_from_system () at preferences.cpp:187
187     std::string login = getlogin();
Value returned is $2 = 0x0

The line was last modified in 43627bb1, commited on 2015-06-28, 8 days after the last release.

Adding a null check solved the whole problem. Well, except for the fact that Aleph One runs extremely slow. We're talking sub-1 fps.

const char* foo = getlogin();
std::string login = (foo ? foo : "Bob User");

I then went back to debugging the SDL issue. I tried uninstalling libsdl2-image-dev and libsdl2-mixer-dev and rebuilding. Aleph One runs without throwing any uncaught exception, although still very slow. Huh? So it never was SDL to begin with?

It looks like _SDLGetError is supposed to always return a string, even if just an empty one when there's been no error (SDL wiki reference). I think I've been chasing a red herring thanks to me forgetting to use the -O0 flag with my first few builds. Feels weird that it would cause GDB to break on the wrong exit call though. Need to investigate further.

Next up: getting back to the Enhanced HUD Lua issue – you know, the one this thread was originally about.

Hopper262 commented 8 years ago

Glad to hear you did track down the true crash, and it's an easy fix!

SDL 2 has several rendering backends; in your case, it's probably using software OpenGL, which would explain the slowness. See this thread. I haven't tried it, but the documentation says you can do something like:

SDL_SetHint(SDL_HINT_RENDER_DRIVER, "software");

Try adding that after SDL_Init and see if it helps.

SweetGale commented 8 years ago

SDL_SetHint(SDL_HINT_RENDER_DRIVER, "software");

Thanks, that did the trick!

The Enhanced HUD is slightly less crash prone in the most recent code. It still crashes at about the same frequency when I try to start a game. If I do manage to get past that point then the Enhanced HUD works as expected. Well, almost. The health and oxygen bars are grey for some reason. I've played for a few minutes each time and seen no crashes during gameplay. Exiting back to the main menu has caused it to crash every time so far though.

The callstacks are intimidating but look fairly consistent. I was hoping Valgrind might be able to provide me with some more info, but unfortunately it's broken on Raspbian. The same illegal instruction that caused me a bit of grief when using GDB spell major trouble for Valgrind. As I understand it some libraries on Raspbian use ARM instructions that the Valgrind people have no intention of supporting. (Allegedly, one of them is for switching endianness.)

Callstack for crash on game start:

*** Error in `/usr/local/bin/alephone': malloc(): memory corruption: 0x00fc9c00 ***

Program received signal SIGABRT, Aborted.
0x75943f70 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: Filen eller katalogen finns inte.
(gdb) bt
#0  0x75943f70 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x75945324 in __GI_abort () at abort.c:89
#2  0x7597f954 in __libc_message (do_abort=<optimized out>, 
    fmt=0x75a35bc0 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x75985b80 in malloc_printerr (action=1, str=0x75a3608c "malloc(): memory corruption", 
    ptr=<optimized out>) at malloc.c:4996
#4  0x75987cd4 in _int_malloc (av=av@entry=0x75a534d4 <main_arena>, bytes=bytes@entry=26)
    at malloc.c:3447
#5  0x75989e18 in __GI___libc_malloc (bytes=bytes@entry=26) at malloc.c:2891
#6  0x7598a834 in __GI___libc_realloc (oldmem=0x0, bytes=26) at malloc.c:2972
#7  0x001ceb60 in l_alloc (ud=0x0, ptr=0x0, osize=7, nsize=26) at lauxlib.c:926
#8  0x001dbfbc in luaM_realloc_ (L=0x795d30, block=0x0, osize=7, nsize=26) at lmem.c:84
#9  0x001d67f8 in luaC_newobj (L=0x795d30, tt=7, sz=26, list=0x0, offset=0) at lgc.c:215
#10 0x001e37dc in luaS_newudata (L=0x795d30, s=2, e=0x0) at lstring.c:179
#11 0x001cc1ec in lua_newuserdata (L=0x795d30, size=2) at lapi.c:1179
#12 0x001fdebc in L_Class<&Lua_Screen_Term_Rect_Name, short>::Push (L=0x795d30, index=0)
    at lua_templates.h:212
#13 0x001f5e98 in Lua_Screen_Get_Term_Rect (L=0x795d30) at lua_hud_objects.cpp:2560
#14 0x001d4404 in luaD_precall (L=0x795d30, func=0xdb1120, nresults=1) at ldo.c:318
#15 0x001d4898 in luaD_call (L=0x795d30, func=0xdb1120, nResults=1, allowyield=0) at ldo.c:394
#16 0x001cb90c in f_call (L=0x795d30, ud=0x7effe950) at lapi.c:923
#17 0x001d39a8 in luaD_rawrunprotected (L=0x795d30, f=0x1cb8cc <f_call>, ud=0x7effe950) at ldo.c:131
#18 0x001d50e8 in luaD_pcall (L=0x795d30, func=0x1cb8cc <f_call>, u=0x7effe950, old_top=432, ef=0)
    at ldo.c:595
#19 0x001cb9e8 in lua_pcallk (L=0x795d30, nargs=1, nresults=1, errfunc=0, ctx=0, k=0x0) at lapi.c:949
#20 0x0021ea14 in L_Class<&Lua_Screen_Name, short>::_get (L=0x795d30) at lua_templates.h:349
#21 0x001d4404 in luaD_precall (L=0x795d30, func=0xdb10f0, nresults=1) at ldo.c:318
#22 0x001d4898 in luaD_call (L=0x795d30, func=0xdb10f0, nResults=1, allowyield=1) at ldo.c:394
#23 0x001eac3c in callTM (L=0x795d30, f=0xa42b20, p1=0xdb1040, p2=0xa21f80, p3=0xdb1040, hasres=1)
---Type <return> to continue, or q <return> to quit---
    at lvm.c:102
#24 0x001eae24 in luaV_gettable (L=0x795d30, t=0xdb1040, key=0xa21f80, val=0xdb1040) at lvm.c:127
#25 0x001ecb2c in luaV_execute (L=0x795d30) at lvm.c:595
#26 0x001d48ac in luaD_call (L=0x795d30, func=0x760608, nResults=0, allowyield=0) at ldo.c:395
#27 0x001cb90c in f_call (L=0x795d30, ud=0x7effeef0) at lapi.c:923
#28 0x001d39a8 in luaD_rawrunprotected (L=0x795d30, f=0x1cb8cc <f_call>, ud=0x7effeef0) at ldo.c:131
#29 0x001d50e8 in luaD_pcall (L=0x795d30, func=0x1cb8cc <f_call>, u=0x7effeef0, old_top=16, ef=0)
    at ldo.c:595
#30 0x001cb9e8 in lua_pcallk (L=0x795d30, nargs=0, nresults=0, errfunc=0, ctx=0, k=0x0) at lapi.c:949
#31 0x00243444 in LuaHUDState::CallTrigger (this=0x735320, numArgs=0) at lua_hud_script.cpp:176
#32 0x002434c4 in LuaHUDState::Init (this=0x735320) at lua_hud_script.cpp:183
#33 0x00243d54 in L_Call_HUDInit () at lua_hud_script.cpp:308
#34 0x00259670 in start_game (user=0, changing_level=false) at interface.cpp:2167
#35 0x002595a0 in begin_game (user=0, cheat=false) at interface.cpp:2140
#36 0x002563c8 in do_menu_item_command (menu_id=129, menu_item=1, cheat=false) at interface.cpp:1395
#37 0x0025a58c in handle_interface_menu_screen_click (x=225, y=196, cheatkeys_down=false)
    at interface.cpp:2640
#38 0x002565cc in portable_process_screen_click (x=225, y=196, cheatkeys_down=false)
    at interface.cpp:1492
#39 0x0001fac0 in process_screen_click (event=...) at shell.cpp:913
#40 0x00021178 in process_event (event=...) at shell.cpp:1420
#41 0x0001f914 in main_event_loop () at shell.cpp:882
#42 0x0001df2c in main (argc=0, argv=0x7efff510) at shell.cpp:379

Callstack for crash on game exit:

*** Error in `/usr/local/bin/alephone': double free or corruption (out): 0x010068c0 ***

Program received signal SIGABRT, Aborted.
0x75943f70 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: Filen eller katalogen finns inte.
(gdb) bt
#0  0x75943f70 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x75945324 in __GI_abort () at abort.c:89
#2  0x7597f954 in __libc_message (do_abort=<optimized out>, 
    fmt=0x75a35bc0 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x75985b80 in malloc_printerr (action=1, str=0x75a35d38 "double free or corruption (out)", 
    ptr=<optimized out>) at malloc.c:4996
#4  0x75986b24 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5  0x00428130 in Image_Blitter::Unload (this=0x1006790) at Image_Blitter.cpp:102
#6  0x00428798 in Image_Blitter::~Image_Blitter (this=0x1006790, __in_chrg=<optimized out>)
    at Image_Blitter.cpp:206
#7  0x004287d8 in Image_Blitter::~Image_Blitter (this=0x1006790, __in_chrg=<optimized out>)
    at Image_Blitter.cpp:207
#8  0x001f0828 in Lua_Image_GC (L=0x9a00b8) at lua_hud_objects.cpp:387
#9  0x001d4404 in luaD_precall (L=0x9a00b8, func=0x12b1ac0, nresults=0) at ldo.c:318
#10 0x001d4898 in luaD_call (L=0x9a00b8, func=0x12b1ac0, nResults=0, allowyield=0) at ldo.c:394
#11 0x001d8744 in dothecall (L=0x9a00b8, ud=0x0) at lgc.c:798
#12 0x001d39a8 in luaD_rawrunprotected (L=0x9a00b8, f=0x1d8710 <dothecall>, ud=0x0) at ldo.c:131
#13 0x001d50e8 in luaD_pcall (L=0x9a00b8, func=0x1d8710 <dothecall>, u=0x0, old_top=16, ef=0)
    at ldo.c:595
#14 0x001d88c8 in GCTM (L=0x9a00b8, propagateerrors=0) at lgc.c:817
#15 0x001d8e6c in callallpendingfinalizers (L=0x9a00b8, propagateerrors=0) at lgc.c:971
#16 0x001d8eb8 in luaC_freeallobjects (L=0x9a00b8) at lgc.c:981
#17 0x001e2af4 in close_state (L=0x9a00b8) at lstate.c:224
#18 0x001e2fdc in lua_close (L=0x9a00b8) at lstate.c:319
#19 0x001084e8 in boost::detail::sp_counted_impl_pd<lua_State*, void (*)(lua_State*)>::dispose (
    this=0x78c110) at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:153
#20 0x0007a9cc in boost::detail::sp_counted_base::release (this=0x78c110)
    at /usr/include/boost/smart_ptr/detail/sp_counted_base_sync.hpp:128
#21 0x0007aa98 in boost::detail::shared_count::~shared_count (this=0x76b9e8, 
---Type <return> to continue, or q <return> to quit---
    __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:371
#22 0x000f68c4 in boost::shared_ptr<lua_State>::~shared_ptr (this=0x76b9e4, __in_chrg=<optimized out>)
    at /usr/include/boost/smart_ptr/shared_ptr.hpp:328
#23 0x00244534 in LuaHUDState::~LuaHUDState (this=0x76b9e0, __in_chrg=<optimized out>)
    at lua_hud_script.cpp:99
#24 0x00244574 in LuaHUDState::~LuaHUDState (this=0x76b9e0, __in_chrg=<optimized out>)
    at lua_hud_script.cpp:100
#25 0x00244280 in CloseLuaHUDScript () at lua_hud_script.cpp:396
#26 0x002599ec in finish_game (return_to_main_menu=true) at interface.cpp:2275
#27 0x00254454 in set_game_state (new_state=10) at interface.cpp:423
#28 0x00256350 in do_menu_item_command (menu_id=128, menu_item=5, cheat=false) at interface.cpp:1380
#29 0x0001ffe8 in handle_game_key (event=...) at shell.cpp:1020
#30 0x00020dd8 in process_game_key (event=...) at shell.cpp:1290
#31 0x000211f8 in process_event (event=...) at shell.cpp:1437
#32 0x0001f914 in main_event_loop () at shell.cpp:882
#33 0x0001df2c in main (argc=0, argv=0x7efff510) at shell.cpp:379
Hopper262 commented 8 years ago

Good to hear the hint helped. I'll add support for that somewhere in the next official release, probably in the advanced graphics prefs.

The gray bars in the HUD are normal. The graphics come straight from the Xbox version, which used the same graphic plus a color tint to draw the bars. Software mode doesn't support color tinting, so that doesn't work. You're the first person to mention this since its release in 2011, which tells you how often software mode and enhanced HUD are used together.

  1. Is software + enhanced HUD unstable on all platforms, or is it Pi-specific or endian-specific? I run this combination periodically and haven't crashed, but I might just be lucky.
  2. Does it crash on other HUD plugins? I have some really minimal ones somewhere, but I'll have to find them. There's this plugin which uses only images from the game data, in case external images are a contributing factor.
zerojay commented 8 years ago

Hi there. I just started work on integrating Aleph One into RetroPie, the retrogaming distribution for Raspberry Pi and I've been seeing the exact same behavior here as mentioned by OP in this thread. I end up getting "double free or corruption (out)" on the command line when a crash occurs. I have yet to try adding in the line you mentioned. I too am using software mode by default.

SweetGale commented 8 years ago

Sorry for the lack of updates. My SD card went corrupt and the lastest backup was two months ago. I decided to get myself a Raspberry Pi 3 as a consolation gift and to try out Ubuntu Mate. One advantage is that I get to try out a newer version of Valgrind (3.11.0 vs 3.7.0). It only reports one illegal instruction this time, but that's still one too many. I did however manage to find some tips for how to get rid of the weird Raspberry Pi libraries:

valgrind unrecognizes memcmp instruction in raspberry Pi

I tried removing "/usr/lib/arm-linux-gnueabihf/libarmmem.so" from /etc/ld.so.preload. Aleph One now runs in both GDB (without needing the "handle SIGILL" command) and Valgrind. It does not crash though when running in Valgrind. Granted, it was so incredibly slow that I only started two games, but I was able to enter and exit both times without a crash. I'm still reading up on Valgrind and have only used it for finding leaks in the past. Any tips on what to look for and what flags to use are much appreciated. Be aware that the Pi 2 and 3 have 1 GB of RAM and it's easy to run out of memory.

Aleph One Valgrind.txt

I also tried building and running on an Ubuntu VM on my Intel iMac. No problems there. Judging by the SourceForge issue I linked previously it might be reproducible on PPC Linux. Unfortunately I don't have a suitable PPC machine available.

The Default HUD plugin crashes in pretty much the same spot as the Enhanced HUD. Switching to OpenGL mode doesn't help either. I haven't enabled the experimental hardware-accelerated OpenGL driver, but I only expect it to affect performance since the HUDs only crash during initialization and destruction.

orbea commented 5 years ago

I just want to point out the grey HUD bars were fixed. https://github.com/Aleph-One-Marathon/alephone/issues/118

Hopper262 commented 5 years ago

The Lua fixes in #83 may have addressed some or all of the issues raised here. Any updates regarding Raspberry Pi and current master would be appreciated.

SweetGale commented 5 years ago

Tested on a fresh Raspbian image on a Raspberry Pi 3 Model B. I can confirm that the Enhanced HUD no longer crashes.

It uses quite a bit of CPU though. I ran in low-res software mode at 1280×800 (and the software SDL renderer – thanks for adding that option!). I got around 11 fps on a default ("-O2") build with the Enhanced HUD and 30 fps (with the occasional dip) without.

Hopper262 commented 5 years ago

Great to hear it no longer crashes, thanks for the report!

The CPU usage sounds about right. Software and accelerated rendering demand very different approaches to optimization, and the Enhanced HUD is optimized for hardware acceleration. The opaque boxy style of 90's HUDs wasn't just an aesthetic choice, it was a necessity! ;)