EchoLiao / hedgewars

Automatically exported from code.google.com/p/hedgewars
GNU General Public License v2.0
0 stars 0 forks source link

Gentoo hedgewars crash, probably related to -O3 in system libs, causes hwengine crash possibly in sound initialization #228

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Start or start replaying a game with in-game sound turned on.

What is the expected output? What do you see instead?

Using a recorded demo and hwengine (to get the most debugging info) I run

hwengine /usr/local/share/hedgewars/Data/ 
.hedgewars/Demos/2011-05-31_20-43.37.hwd --set-audio 100 0 1 --set-video 640 
480 24

and I get the following output:

Hedgewars 0.9.15 engine (network protocol: 37)
Init SDL... ok
Init SDL_ttf... ok
Init SDL_image... ok
Loading /usr/local/share/hedgewars/Data//Graphics/hwengine.png [flags: 8] ok 
(32x32)
Loading progress sprite: Loading 
/usr/local/share/hedgewars/Data//Graphics/Progress.png [flags: 6] ok (324x972)
Number of game controllers: 0
Not using any game controller
Getting game config...
Init sound...An unhandled exception occurred at $B7484F5A :
EAccessViolation : Access violation
  $B7484F5A

Without sound I run

hwengine /usr/local/share/hedgewars/Data/ 
.hedgewars/Demos/2011-05-31_20-43.37.hwd --set-audio 100 0 0 --set-video 640 
480 24

and I get (the game replay starts normally)

Hedgewars 0.9.15 engine (network protocol: 37)
Init SDL... ok
Init SDL_ttf... ok
Init SDL_image... ok
Loading /usr/local/share/hedgewars/Data//Graphics/hwengine.png [flags: 8] ok 
(32x32)
Loading progress sprite: Loading 
/usr/local/share/hedgewars/Data//Graphics/Progress.png [flags: 6] ok (324x972)
Number of game controllers: 0
Not using any game controller
Getting game config...
Reading objects info...

etc.

What version of the product are you using? On what operating system?

Hedgewars 0.9.15 on Gentoo Linux (not compiled using the Hedgewars ebuild but 
using the ordinary cmake/make scripts only).

Please provide any additional information below.

Additional system information (partly Gentoo specific):

System uname: 
Linux-2.6.36-gentoo-i686-Intel-R-_Pentium-R-_Dual_CPU_T2370_@_1.73GHz-with-gento
o-2.0.2
Timestamp of tree: Sun, 22 May 2011 07:15:01 +0000
app-shells/bash:          4.2_p10
dev-java/java-config:     2.1.11-r3
dev-lang/python:          2.7.1-r1, 3.2
dev-util/cmake:           2.8.4-r1
sys-apps/baselayout:      2.0.2
sys-apps/openrc:          0.8.2-r1
sys-apps/sandbox:         2.5
sys-devel/autoconf:       2.13, 2.68
sys-devel/automake:       1.10.3, 1.11.1-r1
sys-devel/binutils:       2.21
sys-devel/gcc:            4.5.2
sys-devel/gcc-config:     1.4.1-r1
sys-devel/libtool:        2.4-r1
sys-devel/make:           3.82
sys-kernel/linux-headers: 2.6.38 (virtual/os-headers)
sys-libs/glibc:           2.13-r2

These are the sound modules loaded:
$ lsmod | grep snd
snd_seq_oss            23626  0 
snd_seq_midi_event      4320  1 snd_seq_oss
snd_seq                39947  4 snd_seq_oss,snd_seq_midi_event
snd_seq_device          4149  2 snd_seq_oss,snd_seq
snd_pcm_oss            32477  0 
snd_mixer_oss          12525  1 snd_pcm_oss
snd_hda_codec_realtek   196303  1 
snd_hda_intel          17533  0 
snd_hda_codec          54466  2 snd_hda_codec_realtek,snd_hda_intel
snd_pcm                56347  3 snd_pcm_oss,snd_hda_intel,snd_hda_codec
snd_timer              14943  2 snd_seq,snd_pcm
snd                    39102  10 
snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_hda_codec_realt
ek,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
soundcore               4079  1 snd
snd_page_alloc          5485  2 snd_hda_intel,snd_pcm

These are the versions of SDL and SDL-Mixer I'm using:
media-libs/libsdl-1.2.14-r6  USE="X alsa audio joystick opengl video xv -aalib 
-custom-cflags -dga -directfb -fbcon -ggi -libcaca -nas -oss (-ps3) -pulseaudio 
-static-libs -svga -tslib -xinerama"
media-libs/sdl-mixer-1.2.11-r1  USE="flac mad midi mikmod mp3 timidity vorbis 
wav -static-libs"

I get sound output from pretty much everything besides Hedgewars.

Original issue reported on code.google.com by psyill....@gmail.com on 31 May 2011 at 8:31

GoogleCodeExporter commented 9 years ago
    WriteToConsole('Init sound...');
    isSoundEnabled:= SDL_InitSubSystem(SDL_INIT_AUDIO) >= 0;

{$IFDEF IPHONEOS}
    channels:= 1;
{$ELSE}
    channels:= 2;
{$ENDIF}

    if isSoundEnabled then
        isSoundEnabled:= Mix_OpenAudio(44100, $8010, channels, 1024) = 0;

Based on log output, the error has to be in the lines above.

(8010 = AUDIO_S16LSB)

There's nothing obviously wrong with this code.
Perhaps you could use it to make a reduced testcase.

Test other SDL apps as well.  The common reasons for problems for Gentoo (and 
Arch) users are failure to include Ogg support (seems like you did that, but 
perhaps you need to rebuild SDL), and an inconsistent sound system + hardware 
that does not support hardware mixing, causing device locks.

strace might yield more info.

Anyway, I'm going with this being a problem w/ your system, not Hedgewars, but 
I'll leave it open a little bit longer.

Original comment by kyberneticist@gmail.com on 1 Jun 2011 at 2:42

GoogleCodeExporter commented 9 years ago
Oh, and your list of apps left out probably the only important one - fpc 
version.
Although I don't think that's the problem in this case...

Original comment by kyberneticist@gmail.com on 1 Jun 2011 at 2:46

GoogleCodeExporter commented 9 years ago
Oh, and since you *are* a gentoo user, you could unpack the sources, go to the 
hedgewars dir, type fpc -g hwengine.pas,
then run:
gdb ./hwengine

and do the usual arguments stuff, to get a more useful backtrace - although 
you'll probably have to rebuild SDL w/ debug symbols, since that is probably 
where it is failing.

Original comment by kyberneticist@gmail.com on 1 Jun 2011 at 8:14

GoogleCodeExporter commented 9 years ago
It's the call to Mix_OpenAudio that fails.
I suspect the hard-coded frequency even though the documentation for 
Mix_OpenAudio and SDL_OpenAudio says that the desired parameters should be 
automatically adjusted to fit the system, because I have had problems before 
with old OSS applications trying to use 44.1kHz on my sound card, which seems 
to deal much better with 48kHz.

I'll investigate this further, but I don't believe the problem lies entirely 
within my system configuration, as I'm able to get most other applications to 
work well with my sound card.

By the way, I have not been using the ebuild, as I mentioned in my first post. 
It seemed too unreliable for a bug report upstream...

Original comment by psyill....@gmail.com on 2 Jun 2011 at 7:41

GoogleCodeExporter commented 9 years ago
Where did I say you were using the ebuild?
I suggested it for SDL as a possibility.

As for 44.1, all the music is encoded at it, so it is the sensible choice for 
initialising SDL.  That will not likely change.
It should not cause any problem with any sound card.  If it is blowing up your 
driver, then, yes, that's probably a problem with your system.

Let me know once you get a better stack trace.

Original comment by kyberneticist@gmail.com on 2 Jun 2011 at 2:42

GoogleCodeExporter commented 9 years ago
Oh. I get it. "unpack the sources" :)

Yes, you said you'd used an ordinary build, but the complete lack of debug info 
made me think you'd gotten a build from someone else, since it seemed odd to 
list a detailed report w/o a useful stack trace :)

Original comment by kyberneticist@gmail.com on 2 Jun 2011 at 2:43

GoogleCodeExporter commented 9 years ago
Sorry, I didn't mean to offend you.

I'm not intending to propose a change of the frequency used either :) it was 
just a quick guess at what might be the problem.

To check that my SDL sound setup isn't fundamentally flawed, I created a small 
test (attached) which initializes the SDL mixer and the test ran without any 
troubles, indicating that the mixer initialization at least works sometimes. 
Output:
$ ./test 
Initializing SDL audio
SDL audio initialized
Initializing SDL mixer
SDL mixer initialized

Though I wouldn't rule out errors in my system, I still think a bug in 
Hedgewars probably exists. I'll get back to you as soon as I get a good stack 
trace on the segfault.

Original comment by psyill....@gmail.com on 2 Jun 2011 at 4:17

Attachments:

GoogleCodeExporter commented 9 years ago
What version of fpc?
We've had mysterious fpc errors in the past for gentoo - esp due to gentoo 
using quite old versions of fpc, although I can't think of how this code could 
cause them.

On the other hand, your test program is basically identical, apart from using 
the type name - and that should darn well have not changed.

You don't have multiple versions of SDL installed, do you?

Original comment by kyberneticist@gmail.com on 2 Jun 2011 at 4:32

GoogleCodeExporter commented 9 years ago
To get a decent stack trace, I recompiled Lua, SDL-mixer, ALSA-lib and 
libvorbis with CFLAGS="-ggdb" - all of a sudden the game worked, with sound 
effects and everything! Thus I thought the problem had been caused by broken 
library dependencies, but just to be on the safe side, I recompiled those 
libraries once again, with normal CFLAGS. Guess what... I get the segfault back.

My normal flags are CFLAGS="-O3 -march=nocona -fomit-frame-pointer -pipe", 
which aren't too extreme. Since my test program ran without failure, although 
it did the supposedly problematic mixer initialization, I'm wondering if it's a 
race condition error which vanishes when some of the libraries used are 
"slower".

As you might guess, I'm having troubles getting a useful backtrace and I'm not 
too comfortable with (or used to) debugging multi-threaded issues in GDB, but 
I'll investigate this further and see if I can find something interesting.

Original comment by psyill....@gmail.com on 2 Jun 2011 at 5:02

GoogleCodeExporter commented 9 years ago
My fpc version is 2.4.0 and there are no multiple SDL versions installed.

Original comment by psyill....@gmail.com on 2 Jun 2011 at 5:06

GoogleCodeExporter commented 9 years ago
With my normal CFLAGS, this is what I get when I run the application with GDB:

Program received signal SIGSEGV, Segmentation fault.
0xb7ebf7e9 in ?? () from /usr/lib/liblua.so.5
(gdb) thread apply all bt

Thread 1 (Thread 0xb79c5a30 (LWP 20291)):
#0  0xb7ebf7e9 in ?? () from /usr/lib/liblua.so.5
#1  0x01000105 in ?? ()
#2  0x00000000 in ?? ()

Changing CFLAGS to -ggdb for Lua takes the execution a bit further:

Hedgewars 0.9.15 engine (network protocol: 37)
Init SDL... ok
Init SDL_ttf... ok
Loading /usr/local/share/hedgewars/Data//Graphics/hwengine.png [flags: 8] ok 
(32x32)
Loading progress sprite: Loading 
/usr/local/share/hedgewars/Data//Graphics/Progress.png [flags: 6] ok (324x972)
Number of game controllers: 0
Not using any game controller
Getting game config...
Init sound...ok
Init mixer...
Program received signal SIGSEGV, Segmentation fault.
0xb7ce6f58 in ?? () from /usr/lib/libasound.so.2
(gdb) thread apply all bt

Thread 1 (Thread 0xb79c9a30 (LWP 22103)):
#0  0xb7ce6f58 in ?? () from /usr/lib/libasound.so.2
Cannot access memory at address 0x3

Recompiling libasound with CFLAGS="-ggdb" gives

Loading /usr/local/share/hedgewars/Data//Sounds/beewater.ogg ok
Freeing progress surface... 
Loading /usr/local/share/hedgewars/Data//Sounds/voices/Default/Illgetyou.ogg 
Program received signal SIGSEGV, Segmentation fault.
0xb7b4a5e6 in vorbis_synthesis_blockin () from /usr/lib/libvorbis.so.0
(gdb) thread apply all bt

Thread 2 (Thread 0xb16e6b70 (LWP 4139)):
#0  0xb7fe1424 in __kernel_vsyscall ()
#1  0xb7e17afc in poll () from /lib/libc.so.6
#2  0xb7cd2b85 in snd1_pcm_wait_nocheck (pcm=0x877ba20, timeout=-1) at 
pcm.c:2367
#3  0xb7cd2a8c in snd_pcm_wait (pcm=0x877ba20, timeout=-1) at pcm.c:2338
#4  0xb7cd7d8e in snd1_pcm_write_areas (pcm=0x877ba20, areas=0xb16e61e0, 
offset=0, size=1024, func=0xb7ce6919 <snd_pcm_plugin_write_areas>) at pcm.c:6726
#5  0xb7ce6d19 in snd_pcm_plugin_writei (pcm=0x877ba20, buffer=0x877ff90, 
size=1024) at pcm_plugin.c:355
#6  0xb7ccff56 in _snd_pcm_writei (pcm=0x877b8c0, buffer=0x877ff90, size=1024) 
at pcm_local.h:521
#7  0xb7cd0ed0 in snd_pcm_writei (pcm=0x877b8c0, buffer=0x877ff90, size=1024) 
at pcm.c:1250
#8  0xb7f8e786 in ?? () from /usr/lib/libSDL-1.2.so.0
#9  0xb7f624db in ?? () from /usr/lib/libSDL-1.2.so.0
#10 0xb7f69ab6 in ?? () from /usr/lib/libSDL-1.2.so.0
#11 0xb7fa1249 in ?? () from /usr/lib/libSDL-1.2.so.0
#12 0xb7fb7b4d in start_thread () from /lib/libpthread.so.0
#13 0xb7e219ce in clone () from /lib/libc.so.6

Thread 1 (Thread 0xb79bca30 (LWP 4136)):
#0  0xb7b4a5e6 in vorbis_synthesis_blockin () from /usr/lib/libvorbis.so.0
#1  0xb7ea73a0 in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Just one thread in the application for the first two crashes - that rules out 
race condition. That leaves the compiler flags: -O3 is considered safe, from 
what I know, but maybe that has changed with my GCC version. I'll recompile the 
libraries with -O2 instead of -O3 and see if that makes any difference.

Original comment by psyill....@gmail.com on 2 Jun 2011 at 5:37

GoogleCodeExporter commented 9 years ago
The application is single threaded, apart from the AI thread.

The AI thread is quite simple, and does not even use SDL threading anymore.

The varying crash locations are mysterious, I'm suspicious of the lua crash.
Corruption, or just varying errors based on your varying builds.
Certainly there's no obvious way for it to be any place other than the lines I 
listed, since the very next line write a confirmation to the log, which was not 
visible in your output.

Original comment by kyberneticist@gmail.com on 2 Jun 2011 at 7:12

GoogleCodeExporter commented 9 years ago
I've narrowed the problem down to the general optimization flag used for 
compiling the libraries used by the application.

Using
-O2
the application works, using
-O3
the application crashes.

The recompiled libraries are
libvorbis
alsa-lib
lua
sdl-mixer

The question is why this change in optimization of the library code only seems 
to affect Hedgewars (and no other application using the libraries). My next 
step is to construct a small test program in Pascal and see if that triggers 
the problem.

Original comment by psyill....@gmail.com on 2 Jun 2011 at 8:48

GoogleCodeExporter commented 9 years ago
I don't get it. My Pascal test program works with libraries compiled with -O3 
but Hedgewars does not.

Code for test program:
======================
program test;

uses SDLh;

begin
  Write('Initializing SDL ');
  SDL_Init(SDL_INIT_AUDIO);
  WriteLn('ok');
  Write('Initializing mixer ');
  Mix_OpenAudio(44100, $8010, 2, $400);
  WriteLn('ok');
  Mix_CloseAudio();
  SDL_Quit();
end.
======================
Makefile:
======================
PC = fpc
PFLAGS =-k-z -knoexecstack -dSDL_IMAGE_NEWER -dSDL_MIXER_NEWER -O2 -Xs -Si -B 
-Cs2000000 -vewn
RM = rm -f

TARGETS = test

.PHONY: all clean

all: $(TARGETS)

%:%.pas
    $(PC) $(PFLAGS) $^

clean:
    $(RM) $(TARGETS)
======================

I copied SDLh.pas and options.inc from Hedgewars.
The program runs without errors.

Original comment by psyill....@gmail.com on 2 Jun 2011 at 9:31

GoogleCodeExporter commented 9 years ago
It doesn't matter what flags are used to compile sdl-mixer, it will work anyway.
If any one library of
alsa-lib
lua
libvorbis
is compiled with -O3 set, Hedgewars will crash.

Original comment by psyill....@gmail.com on 2 Jun 2011 at 9:52

GoogleCodeExporter commented 9 years ago
Given hedgewars doesn't use alsa at all, I think you've confirmed the problem 
is with your system.

Sounds like -O3 is buggy, to say the least, at least on your setup.

Original comment by kyberneticist@gmail.com on 2 Jun 2011 at 10:42

GoogleCodeExporter commented 9 years ago

Original comment by kyberneticist@gmail.com on 2 Jun 2011 at 10:46

GoogleCodeExporter commented 9 years ago
You're probably right, but it's nevertheless strange that the test program in 
comment #14 works and that Hedgewars is the only application on my system which 
can't handle the sound when the ordinary CFLAGS are set.

Original comment by psyill....@gmail.com on 4 Jun 2011 at 6:35

GoogleCodeExporter commented 9 years ago
Hello and thank you for reporting; do you think you could mirror this bug topic 
in Gentoo bugzilla so to hear an opinion from the lib mantainers?

However last time I checked developers weren't too happy to deal with -O3; what 
happens with other optimization levels (namely -O1, -O0 and -Os)? That might 
reveal other bugs in the libraries.

Original comment by vittorio...@gmail.com on 8 Jun 2011 at 6:32

GoogleCodeExporter commented 9 years ago
Based on lack of response, and probably relation to some random optimisations 
in some unrelated lib, going to tag invalid...
Might be more "won't fix" or more precisely "can't fix" but whatevs.

Original comment by kyberneticist@gmail.com on 2 Jul 2011 at 1:30