erkkah / tigr

TIGR - the TIny GRaphics library for Windows, macOS, Linux, iOS and Android.
Other
761 stars 48 forks source link

Mesa Memory Leak #62

Open AdreKiseque opened 5 months ago

AdreKiseque commented 5 months ago

TIGR seems to cause memory leaks whenever it... runs. Clang Static Analyzer and htop return no issue, but Clang LeakSanitizer and Valgrind both freak out. LeakSanitizer output:

tigr-test/leak/ $ ./leak
==109252==WARNING: invalid path to external symbolizer!
==109252==WARNING: Failed to use and restart external symbolizer!

=================================================================
==109252==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 6720 byte(s) in 1 object(s) allocated from:
    #0 0x55ff8ecea592  (/workspaces/160550426/tigr-test/leak/leak+0x33592) (BuildId: b4db616332db8de29c4f8cac38b13bff386a95bb)
    #1 0x7d08ceabe7b3  (/lib/x86_64-linux-gnu/libGLX_mesa.so.0+0x317b3) (BuildId: 81f27aa4cfe213187c5a2dda6902a87fd132a76e)

SUMMARY: LeakSanitizer: 6720 byte(s) leaked in 1 allocation(s).

Valgrind's outputs are massive since it seems to print for each loop of the window, but it points in the same direction. Some sample lines: Edit: It is not printing out errors for every loop, it just prints one massive block of errors on exit.

==96672== 194,880 bytes in 840 blocks are still reachable in loss record 2,689 of 2,692
==96672==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==96672==    by 0x5586802: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==96672==    by 0x558A0F7: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==96672==    by 0x50DC79A: glXChooseFBConfig (in /usr/lib/x86_64-linux-gnu/libGLX.so.0.0.0)
==96672==    by 0x4AA7D8F: (below main) (libc_start_call_main.h:58)
==96672== 
==96672== 265,072 bytes in 1 blocks are indirectly lost in loss record 2,690 of 2,692
==96672==    at 0x484DE30: memalign (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==96672==    by 0x484DF92: posix_memalign (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==96672==    by 0x58428B9: ??? (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==96672==    by 0x5790660: ??? (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==96672==    by 0x5587C28: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==96672==    by 0x5586F28: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)

I'm running on Codespaces. I can't tell if the issue is with TIGR's useage of Mesa or Mesa itself, nor can I tell if this is an actual problematic leak or a false positive, but it seems worth calling attention to. Happens with any useage of TIGR, down to drawing an empty window.

#include "../tigr/tigr.h"

int main(int argc, char *argv[])
{
    Tigr *screen = tigrWindow(320, 240, "Hello", 0);
    while (!tigrClosed(screen))
    {
        tigrUpdate(screen);
    }
    tigrFree(screen);
    return 0;
}

Hopefully this helps track something down.

AdreKiseque commented 5 months ago

Update: Compiling with gcc instead of Clang I was able to get some more detailed errors. LeakSanitizer:

==29525==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 6720 byte(s) in 1 object(s) allocated from:
    #0 0x7f86a9632302 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
    #1 0x7f86a87be7b3  (/lib/x86_64-linux-gnu/libGLX_mesa.so.0+0x317b3)

SUMMARY: LeakSanitizer: 6720 byte(s) leaked in 1 allocation(s).

Valgrind doesn't seem to be providing more detail necessarily, but I did catch something interesting:

==38490== 375,989 (376 direct, 375,613 indirect) bytes in 1 blocks are definitely lost in loss record 2,692 of 2,692
==38490==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==38490==    by 0x51D0B2D: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==38490==    by 0x51CFF28: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==38490==    by 0x51D33FB: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==38490==    by 0x51D3760: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==38490==    by 0x4D2697B: glXCreateContext (in /usr/lib/x86_64-linux-gnu/libGLX.so.0.0.0)
==38490==    by 0x1115A9: tigrWindow (tigr.c:4639)
==38490==    by 0x10BF19: main (leak.c:5)

This suggests the issue is indeed directly in the tigrWindow function.

AdreKiseque commented 5 months ago

Update: Further pursuing the leak, I followed the trail presented by Valgrind. It brought me to a snippet showing a pointer being overwritten without its memory being freed (tigr.c:4639):

    glc = glXCreateContext(dpy, vi, NULL, GL_TRUE);
    int contextAttributes[] = { GLX_CONTEXT_MAJOR_VERSION_ARB, 3, GLX_CONTEXT_MINOR_VERSION_ARB, 3, None };
    glc = glXCreateContextAttribsARB(dpy, fbConfig, NULL, GL_TRUE, contextAttributes);
    glXMakeCurrent(dpy, xwin, glc);

Not really sure what that first assignment is being used for, but I added a quick free of it before being overwritten as a little bandage.

    glc = glXCreateContext(dpy, vi, NULL, GL_TRUE);
    int contextAttributes[] = { GLX_CONTEXT_MAJOR_VERSION_ARB, 3, GLX_CONTEXT_MINOR_VERSION_ARB, 3, None };
    glXDestroyContext(dpy, glc);
    glc = glXCreateContextAttribsARB(dpy, fbConfig, NULL, GL_TRUE, contextAttributes);
    glXMakeCurrent(dpy, xwin, glc);

(A cleaner solution would probably be to just not create the first context at all, but I'm not confident enough in my understanding to say it's not doing something in that one line between its creation and it being overwritten and that removing it entirely wouldn't break something.) That silenced the last error, but presented a new one. Following it, I came to the initX11Stuff function (tigr.c:4476):

        GLXFBConfig* fbc = glXChooseFBConfig(dpy, DefaultScreen(dpy), attribList, &fbcCount);
        if (!fbc) {
            tigrError(0, "Failed to choose FB config");
        }
        fbConfig = fbc[0];

Seeing this fbc variable wasn't ever used after this, I freed it right there.

        GLXFBConfig* fbc = glXChooseFBConfig(dpy, DefaultScreen(dpy), attribList, &fbcCount);
        if (!fbc) {
            tigrError(0, "Failed to choose FB config");
        }
        fbConfig = fbc[0];
        XFree(fbc);

However, bafflingly, Valgrind still reports memory issues from this line.

==66884== 194,880 bytes in 840 blocks are still reachable in loss record 2,391 of 2,391
==66884==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==66884==    by 0x51CF802: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==66884==    by 0x51D12F1: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==66884==    by 0x51D6FCF: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==66884==    by 0x51D2745: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==66884==    by 0x51D30F7: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0)
==66884==    by 0x4D2779A: glXChooseFBConfig (in /usr/lib/x86_64-linux-gnu/libGLX.so.0.0.0)
==66884==    by 0x110D3C: initX11Stuff (tigr.c:4476)
==66884==    by 0x1111A7: tigrWindow (tigr.c:4576)
==66884==    by 0x10BF19: main (leak.c:5)

Worth noting, there are no full leaks in the program by this point, only unfreed pointers. EDIT: I'm illiterate apparently, over 9KB are "possibly lost".

==66884== LEAK SUMMARY:
==66884==    definitely lost: 0 bytes in 0 blocks
==66884==    indirectly lost: 0 bytes in 0 blocks
==66884==      possibly lost: 9,234 bytes in 76 blocks
==66884==    still reachable: 811,055 bytes in 4,520 blocks
==66884==         suppressed: 0 bytes in 0 blocks
==66884== 
==66884== For lists of detected and suppressed errors, rerun with: -s
==66884== ERROR SUMMARY: 53 errors from 53 contexts (suppressed: 0 from 0)

Also interesting to note, with all these changes, compiling and running the program with LeakSanitizer doesn't seem to return any errors, but it does randomly crash on launch with a segmentation fault sometimes. I'm not certain when the segfault issue started, if it was something I did or if it's always been there, but I can say that after making the first change it would inconsistently return errors when the program terminated (perhaps related to how long the program ran for?)

Overall, I think I've done everything I can. There are probably other instances of unfreed memory but they're likely all weird instances like this or trivial instances like the first one. But I'm a little overwhelmed and not quite motivated to hunt down more trivial instances when these weird instances run amook (but I can say it looks like dpy (tigr.c:4454) isn't being freed).

AdreKiseque commented 5 months ago

So, it looks like indeed, this issue in particular is happening at a lower level, with the glXChooseFBConfig function. There are some other reports of it online, though most of them are pretty old. That said, there definitely are some leaks caused by TIGR itself that could do to be resolved.