Closed hobbes1069 closed 6 years ago
From that stack trace, it looks like it's crashing somewhere in a dlclose() call. It's probably not a bug in libglvnd itself (assuming you're using a relatively recent version). Ordinarily, I might suspect a bug in the vendor library, but it looks like it's not even getting that far.
It's not trying to call dlclose more than once, is it? I've seen some weird cases in Fedora and RHEL where DSO destructors get called multiple times or in unexpected orders, but I've never been able to figure out why.
I'm not able to reproduce this on my F27 machine (with the 0.9.23 build of hedgewars from updates-testing). Is this specific to a particular configuration? I have both Mesa and NVIDIA's drivers installed but the GPU is an Intel Haswell.
Gwen Ciesla (limb) reported it first and I've been able to reproduce it on two F26 boxes with Nvidia and my wife's Core i5-8250U w/ intel graphics.
Does this valgrind output help at all?
==21760== 80 bytes in 1 blocks are possibly lost in loss record 285 of 541
==21760== at 0x4C2EB6B: malloc (vg_replace_malloc.c:299)
==21760== by 0x8DB1C66: RegisterStubCallbacks (GLdispatch.c:411)
==21760== by 0x8DB1E9A: __glDispatchInit (GLdispatch.c:187)
==21760== by 0x85F3E18: __glXInit (libglx.c:2030)
==21760== by 0x4010D92: call_init (dl-init.c:72)
==21760== by 0x4010D92: _dl_init (dl-init.c:120)
==21760== by 0x4015CE9: dl_open_worker (dl-open.c:564)
==21760== by 0x69B018E: _dl_catch_error (dl-error-skeleton.c:198)
==21760== by 0x40151F8: _dl_open (dl-open.c:649)
==21760== by 0x6657F25: dlopen_doit (dlopen.c:66)
==21760== by 0x69B018E: _dl_catch_error (dl-error-skeleton.c:198)
==21760== by 0x66586A4: _dlerror_run (dlerror.c:163)
==21760== by 0x6657FB0: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
@hobbes1069 On the very top of that valgrind output, there must be some message about invalid read size
… That's the actual useful one to trace this down…
Here's the full output: https://pastebin.com/yc8EJdn1
There you can see the reason for the segfault:
==15181==
==15181== HEAP SUMMARY:
==15181== in use at exit: 2,364,852 bytes in 1,245 blocks
==15181== total heap usage: 76,248 allocs, 75,003 frees, 576,915,272 bytes allocated
==15181==
==15181== Searching for pointers to 1,245 not-freed blocks
==15181== Checked 44,733,872 bytes
==15181==
==15181== LEAK SUMMARY:
==15181== definitely lost: 8,353 bytes in 5 blocks
==15181== indirectly lost: 1,728 bytes in 38 blocks
==15181== possibly lost: 287,471 bytes in 2 blocks
==15181== still reachable: 2,067,300 bytes in 1,200 blocks
==15181== of which reachable via heuristic:
==15181== newarray : 16 bytes in 1 blocks
==15181== multipleinheritance: 104 bytes in 1 blocks
==15181== suppressed: 0 bytes in 0 blocks
==15181== Rerun with --leak-check=full to see details of leaked memory
==15181==
==15181== ERROR SUMMARY: 4 errors from 1 contexts (suppressed: 0 from 0)
==15181==
==15181== 4 errors in context 1 of 1:
==15181== Syscall param writev(vector[...]) points to uninitialised byte(s)
==15181== at 0x67438E0: __writev_nocancel (syscall-template.S:84)
==15181== by 0x9802F86: write_vec (xcb_conn.c:277)
==15181== by 0x9802F86: _xcb_conn_wait (xcb_conn.c:522)
==15181== by 0x9803384: _xcb_out_send (xcb_out.c:458)
==15181== by 0x980340C: xcb_writev (xcb_out.c:406)
==15181== by 0x9033205: _XSend (xcb_io.c:486)
==15181== by 0x903372B: _XReply (xcb_io.c:573)
==15181== by 0x901E1FE: XInternAtom (IntAtom.c:181)
==15181== by 0x5767F5A: SetWindowBordered (SDL_x11window.c:349)
==15181== by 0x5768EC2: X11_CreateWindow (SDL_x11window.c:521)
==15181== by 0x575716C: SDL_CreateWindow_REAL (SDL_video.c:1475)
==15181== by 0x575774A: ShouldUseTextureFramebuffer (SDL_video.c:220)
==15181== by 0x575774A: SDL_VideoInit_REAL (SDL_video.c:544)
==15181== by 0x56D1736: SDL_InitSubSystem_REAL (SDL.c:170)
==15181== Address 0x87f1723 is 35 bytes inside a block of size 16,384 alloc'd
==15181== at 0x4C30A1E: calloc (vg_replace_malloc.c:711)
==15181== by 0x90232D5: XOpenDisplay (OpenDis.c:237)
==15181== by 0x5766B7D: X11_CreateDevice (SDL_x11video.c:194)
==15181== by 0x575762E: SDL_VideoInit_REAL (SDL_video.c:505)
==15181== by 0x56D1736: SDL_InitSubSystem_REAL (SDL.c:170)
==15181== by 0x40503C: ??? (in /usr/bin/hwengine)
==15181== by 0x875356F: ???
==15181== by 0x8752E2F: ???
==15181== by 0x875356F: ???
==15181== by 0x1FFEFFF90F: ???
==15181== by 0x1FFEFFF7CF: ???
==15181== by 0x1FF: ???
==15181== Uninitialised value was created by a stack allocation
==15181== at 0x902E960: XStoreColors (StColors.c:38)
==15181==
==15181== ERROR SUMMARY: 4 errors from 1 contexts (suppressed: 0 from 0)
The problem comes from the XServer…
I'm not a programmer, but I'm working with someone on the Fedora devel list and it has something to do with "not keeping %rsp 16-byte aligned"
Ok, looks like it's an FPC bug, but why does it seem to only happen on Fedora?
I don't know if this problem is directly related to libglvnd or not but I'm guessing someone here can read the stack trace and give me some hints. I'm not sure which distros have implemented libglvnd but the crash is only being reported on Fedora and Hedgewars upstream hasn't been able to figure out what's causing it.
The crash is with hwengine in 0.9.23 which uses SDL2, prior to that release SDL 1 was used.