anholt / libepoxy

Epoxy is a library for handling OpenGL function pointer management for you
Other
680 stars 161 forks source link

OpenGL Functions Unloaded/Crash #200

Open RobertBColton opened 5 years ago

RobertBColton commented 5 years ago

Hello, our project, a game engine, is investigating switching our glew dependency to libepoxy so that we can simplify context creation and support OpenGL ES. We use a bridging system so the user can select from the GUI whether to use SDL or raw native Win32/XLIB for the window. I've had relatively great success switching our bridges over to using libepoxy instead of glew. https://github.com/enigma-dev/enigma-dev/pull/1602

Now, I've run into an issue with bridging Win32 or SDL with OpenGL 3.3 core. I am compiling with MSYS2 64 bit and I seem to be getting a crash in wglMakeCurrent(NULL,NULL); when I close the window (negative return code). I don't get it with the same setup with OpenGL1 and get a successful 0 exit code. My context deletion code is the same for GL1 and GL3 because the bridges are capable of sharing source code.

In the case of SDL, all I do is call SDL_GL_DeleteContext(context);: https://github.com/enigma-dev/enigma-dev/blob/2cc99695e0929a2ca84becdac62acc36b18f6619/ENIGMAsystem/SHELL/Bridges/SDL-OpenGL/graphics_bridge.cpp#L47

In the case of Win32, I use the same context deletion as your tests: https://github.com/enigma-dev/enigma-dev/blob/2cc99695e0929a2ca84becdac62acc36b18f6619/ENIGMAsystem/SHELL/Bridges/Win32-OpenGL/graphics_bridge.cpp#L35

Yet, alas, I get this stack trace when closing the window if my context is a core 3.3 context:

#0  0x00007fff554e436f in msvcrt!memmove ()
   from C:\WINDOWS\System32\msvcrt.dll
#1  0x00000000004cb916 in epoxy_handle_external_wglMakeCurrent.part.1 ()
#2  0x00000000004cb94b in epoxy_wglMakeCurrent_wrapped ()
#3  0x00000000004caaf2 in enigma::DisableDrawing ()
    at Bridges/Win32-OpenGL/graphics_bridge.cpp:34
RobertBColton commented 5 years ago

Ok, so through more debugging I have narrowed this problem down to two issues.

Calling GL Functions in Stack Destructor Crashes

We have several classes in our engine that handle cleaning up the OpenGL objects in their destructors. We may change this but regardless I want to document it for the libepoxy users. If you have a stack allocated object whose destructor is called when the program closes, then libepoxy will have already unloaded the OpenGL function pointers. This means if you call an OpenGL function in your destructor, you will be met with the same crash as me.

    ~SamplerState() {
      glDeleteSamplers(1, &sampler_index);
    }

Following a Stack Overflow post, I was able to get a full stack trace which helped me figure this out. https://stackoverflow.com/a/9810389

Thread 1 received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) bt
#0  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) set $pc = *(void **)$rsp
(gdb) set $rsp = $rsp + 8
(gdb) bt
#0  enigma::SamplerState::~SamplerState (
    this=0x85b478 <enigma::samplerstates+56>, __in_chrg=<optimized out>)
    at Graphics_Systems/OpenGL3/GL3textures.cpp:302
#1  0x0000000000421fa0 in __tcf_1 ()
    at Graphics_Systems/OpenGL3/GL3textures.cpp:306
#2  0x00007fff554aa2bb in msvcrt!_initterm_e ()
   from C:\WINDOWS\System32\msvcrt.dll
#3  0x00000000004014b5 in __tmainCRTStartup ()
    at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:342
#4  0x000000000040150b in mainCRTStartup ()
    at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:223
(gdb)

Simply including <epoxy/wgl.h> crashes on close

I don't know if this header has anything stack allocated? Regardless, its presence in the source seems to result in a crash on wglMakeCurrent(NULL, NULL); when closing the window. The stack trace is the same as in my original post.

#0  0x00007fff554e436f in msvcrt!memmove ()
   from C:\WINDOWS\System32\msvcrt.dll
#1  0x00000000004cb916 in epoxy_handle_external_wglMakeCurrent.part.1 ()
#2  0x00000000004cb94b in epoxy_wglMakeCurrent_wrapped ()
#3  0x00000000004caaf2 in enigma::DisableDrawing ()
    at Bridges/Win32-OpenGL/graphics_bridge.cpp:34
RobertBColton commented 5 years ago

I want to add that I think it's possibly something to consider that libepoxy could maybe have a custom exit handler to allow calling OpenGL functions in destructors safely. The following VirtualBox mailing post pontificates on this. https://www.virtualbox.org/pipermail/vbox-dev/2016-November/014175.html

RobertBColton commented 5 years ago

@anholt is perhaps the second wgl.h issue because there's no NULL checking in epoxy_handle_external_wglMakeCurrent? The crash occurs in the below block of code that's just after the closing brace of my game's main while loop. It's the same thread where I created the context, so I don't understand why it should crash. https://github.com/anholt/libepoxy/blob/d536f78db81853b18ffc733af8a1474e9ca08950/src/dispatch_wgl.c#L76

void DisableDrawing(void*)
{
  wglMakeCurrent(NULL, NULL);
  wglDeleteContext(hRC);
  ReleaseDC(enigma::hWnd, enigma::window_hDC);
}
RobertBColton commented 5 years ago

@ebassi might you have an idea about this crash?

RobertBColton commented 5 years ago

I've now discovered just the following will crash on the second line with the same crash log.

  wglMakeCurrent(enigma::window_hDC, LegacyRC);
  wglMakeCurrent(NULL,NULL);
gjz010 commented 3 years ago

I meet the similar problem when I was cross-compiling libepoxy testcases statically(!) using mingw-w64, and was able to detect and fix the problem to get the testcases running under a special case.

The problem: libepoxy assumes shared-linking on Win32.

The problem can be easily traced to (generated) gl_generated_dispatch.c, where the dispatch_table in

void
gl_init_dispatch_table(void)
{
    struct dispatch_table *dispatch_table = get_dispatch_table();
    memcpy(dispatch_table, &resolver_table, sizeof(resolver_table));
}

is 0, causing a segmentation fault. It is easy to identify that get_dispatch_table() (on Win32) is simply a wrapper of TlsGetValue(gl_tls_index), fetching the thread-local dispatch table from Thread Local Storage.

The TLS is only initialized in DllMain, located in dispatch_wgl.c:

BOOL WINAPI
DllMain(HINSTANCE dll, DWORD reason, LPVOID reserved)
{
    void *data;

    switch (reason) {
    case DLL_PROCESS_ATTACH:
        gl_tls_index = TlsAlloc();
        if (gl_tls_index == TLS_OUT_OF_INDEXES)
            return FALSE;
        wgl_tls_index = TlsAlloc();
        if (wgl_tls_index == TLS_OUT_OF_INDEXES)
            return FALSE;

        first_context_current = false;

        /* FALLTHROUGH */

    case DLL_THREAD_ATTACH:
        data = LocalAlloc(LPTR, gl_tls_size);
        TlsSetValue(gl_tls_index, data);

        data = LocalAlloc(LPTR, wgl_tls_size);
        TlsSetValue(wgl_tls_index, data);

        break;

    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        data = TlsGetValue(gl_tls_index);
        LocalFree(data);

        data = TlsGetValue(wgl_tls_index);
        LocalFree(data);

        if (reason == DLL_PROCESS_DETACH) {
            TlsFree(gl_tls_index);
            TlsFree(wgl_tls_index);
        }
        break;
    }

    return TRUE;
}

However, the DllMain will apparently not be invoked by a statically-linked application, which will thus not invoke any LocalAlloc or TlsAlloc, and will thus lead to a crash.

A patch, but on POSIX thread model

Fortunately what the DllMain handles is simple: initializing TLS for every thread. I have not yet think of a method to patch it on Win32 thread model, but this is exactly what __thread is for on posix model:

#ifdef WIN32_POSIX_PATCH
__thread struct dispatch_table gl_tls_table;
#else
uint32_t gl_tls_index;
uint32_t gl_tls_size = sizeof(struct dispatch_table);
#endif

static inline struct dispatch_table *
get_dispatch_table(void)
{
#ifdef WIN32_POSIX_PATCH
    return &gl_tls_table;
#else
        return TlsGetValue(gl_tls_index);
#endif

After this patching as well as enabling the macro WIN32_POSIX_PATCH, I can now run the testcases.

This patch is enough for my use since my project is right working on the POSIX thread model. Hopefully I will try to add more stuff (detecting POSIX thread model, for example, testing whether __thread is available) and start a pull request to solve the problem at least when the thread model is POSIX.