BindBC / bindbc-sdl

Static & dynamic D bindings to SDL and the SDL_* libraries, compatible with BetterC, @nogc, and nothrow.
Boost Software License 1.0
86 stars 24 forks source link

Calling `loadSDL` before `Mix_Init` with any argument causes segmentation fault on WSL #59

Closed avaxar closed 6 months ago

avaxar commented 7 months ago

Initializing SDL2_mixer using Mix_Init supplied with any argument (other than 0) causes a segfault with error code -11, when a loadSDL call is present before it. This bug is present in WSL Ubuntu, where I tested it, and it might be present on native Linux machines as well.

Upon debugging, the segfault is thrown in an SDL_snprintf call, which's invoked from load_music_type (an internal SDL2_mixer function), which's invoked from the user-called Mix_Init function. Could SDL2_mixer possibly be trying to call from a different instance of SDL2 than the one loaded by bindbc-sdl?

Minimal reproduceable example

import bindbc.sdl;

void main() {
    loadSDL(); // So long as it's here, it'll produce the bug.
    loadSDLMixer();
    Mix_Init(MIX_INIT_OGG); // `MIX_INIT_OGG` can be substituted with anything.
}
ichordev commented 7 months ago

I will try your example later to see what happens on my machine. I usually use static bindings so sometimes I miss these things. In the meantime, could you please provide a stack trace? (e.g. by using gdb) Even if I experience a segmentation fault there’s no guarantee it’s caused by the same thing, since it’s such a generic error.

avaxar commented 7 months ago
Thread 1 (Thread 0x7ffff7c7fdc0 (LWP 471) "testd"):
#0  0x0000555555612ec0 in SDL_snprintf ()
#1  0x00007ffff70414d7 in load_music_type () at /lib/x86_64-linux-gnu/libSDL2_mixer.so
#2  0x00007ffff70418fa in Mix_Init () at /lib/x86_64-linux-gnu/libSDL2_mixer.so
#3  0x00005555555b16a0 in Mix_Init (flags=16) at ../.dub/packages/bindbc-sdl/1.4.5/bindbc-sdl/source/sdl_mixer.d-mixin-191:270
#4  0x00005555555ae764 in _Dmain () at source/app.d:6
ichordev commented 6 months ago

This issue is very odd. It only occurs with SDL Mixer, not SDL TTF, SDL Net, or SDL Image. I'm able to reproduce it on a Debian machine, but it's a bit hard to tell if it's a problem with the bindings at all. As for your hypothesis that it's using the wrong SDL internally, loading different versions of SDL/SDL Mixer manually by supplying a string to loadSDL/loadSDLMixer didn't help in my case.

Mergull commented 6 months ago

I had very similar problem very recently, so I looked into bindbc code and found cause of this issue. Basically bindbc-common overwrites all symbols in library with proto functions which then call function pointer loaded from library. All symbols dynamically linked by libraries (e.g. SDL_Image loading SDL_SetError symbol) link to this proto functions. Problem happens when function is variadic in which case bindbc-common codegen instead of making proto function just overwrites function symbol with function pointer. It work's from D code, which knows that it's pointer. But other libraries links to pointer while expecting function there. This way every library which depends on SDL would crash trying to call wrong address.

Example of code generated by bindbc-common templates:

alias _pSDL_SetError = extern(C) void function(const(char)* fmt, ...
alias _pSDL_GetError = extern(C) const(char)* function();

extern(C) _pSDL_SetError SDL_SetError; // overwritten symbol with function pointer - crash when called from SDL_xxx
package _pSDL_GetError _SDL_GetError;
extern(C) const(char)* SDL_GetError(){ return _SDL_GetError(__traits(parameters)); // proto function works fine

I've tested my suspicions by changing mangleof to stringof in bindbc-common and removing extern(C) for variadic functions and everything works fine.

This problem might be Linux specific due to it's linking mechanism. On Linux all symbols are exported by default which is not the case e.g. for Windows. (I haven't worked on Windows for a very long time, so I'm not sure if it can happend on Windows in some circumstances. Theoretically it's possible, but it depends on how Windows resolves symbols exported multiple times from different libraries)

ichordev commented 6 months ago

Example of code generated by bindbc-common templates:

alias _pSDL_SetError = extern(C) void function(const(char)* fmt, ...
alias _pSDL_GetError = extern(C) const(char)* function();

extern(C) _pSDL_SetError SDL_SetError; // overwritten symbol with function pointer - crash when called from SDL_xxx
package _pSDL_GetError _SDL_GetError;
extern(C) const(char)* SDL_GetError(){ return _SDL_GetError(__traits(parameters)); // proto function works fine

Ah, thank you for finding this! It wouldn’t be the first time I’ve messed that up either.

I always forget to make sure that I’m only using extern(C) for the function pointer type (‘function uses C’s calling convention’), and never extern(C) for the function pointer variable (‘variable should be exported with C mangling, which obviously overwrites the one from the library were binding to’).

I've tested my suspicions by changing mangleof to stringof in bindbc-common and removing extern(C) for variadic functions and everything works fine.

I’m not sure exactly what you mean by using stringof instead of mangleof? Surely just removing extern(C) from the function pointer value should do the trick?

ichordev commented 6 months ago

This issue has been resolved with BindBC-Common v0.1.4

Mergull commented 6 months ago

I’m not sure exactly what you mean by using stringof instead of mangleof? Surely just removing extern(C) from the function pointer value should do the trick?

Code change

-       lib.bindSymbol(cast(void**)&` ~ iden ~ `, here.` ~ iden ~ `.mangleof);`;
+   lib.bindSymbol(cast(void**)&` ~ iden ~ `, here.` ~ iden ~ `.stringof);`;

From perspective of dynamic loading only function types/declarations need to be extern() and proto functions are not required at all. What surprised me is that BindBC now aims also extern C++ and objective-C. Probably for that reason there are proto functions at all. Symbol binding use ProtoFunc.mangleof to get symbol name. As I'm using bindbc to load only C libraries changing mangleof to stringof fixed issue. But other languages has different mangling. With just removed extern(C) code would stop working. Because now mangling for variadic functions will be D mangling, so wrong symbol name during symbol loading.

ichordev commented 6 months ago

proto functions are not required at all. Probably for that reason there are proto functions at all.

No. I have wrapper functions so that the syntax for function calls and references between static and dynamic builds is the same.

Because now mangling for variadic functions will be D mangling, so wrong symbol name during symbol loading.

The issue was resolved by removing only the extern from the function pointer variable, as I suspected would be the case. I don’t think you understood me.

Mergull commented 6 months ago

Interesting. Side effect of that is that external library which call SDL function would call function which then calls function pointer, so we've got additional indirection (two jumps/function calls instead of one. At least for linux where symbol from executable is taken).

Back to the topic. I've tested latest bindbc-common and as I suspected it doesn't work at all for me. Removing extern(C) fix linking problem, but introduce new issue with symbol names. Code which I added in previous response should explain that. bindSymbol load symbol by name. There is no proto function for variadic functions. You are getting symbol name by variadic_pointer.mangleof.

Example of error from loadSDL: Missing Symbol: _D3sdl5error12SDL_SetErrorPUNbNiPxaYv

If SDL_SetError is not extern(C) then SDL_SetError.mangleof return _D3sdl5error12SDL_SetErrorPUNbNiPxaYv (D mangling) instead of SDL_SetError (C mangling)

ichordev commented 6 months ago

Example of error from loadSDL: Missing Symbol: _D3sdl5error12SDL_SetErrorPUNbNiPxaYv

Ah yes... I didn't test for that properly of course. 🤦‍♀️ My apologies, that should also be fixed now in v0.1.5.