Starlink / ast

Starlink AST Library
GNU Lesser General Public License v3.0
12 stars 12 forks source link

possible memory leaks detected by valgrind #23

Closed jvo203 closed 3 months ago

jvo203 commented 3 months ago

Let's start with an empty "do-nothing" C program:

#include <stdlib.h>
#include <stdio.h>
#include <star/ast.h>

int main()
{
    return 0;
}

Compilation step: ccpkg-config --cflags cfitsio-I/home/chris/Downloads/wcssubs -o test_starlink test_starlink.c -L/usr/local/libast_link` Valgrind command:valgrind --leak-check=full --show-leak-kinds=all --suppressions=/usr/share/glib-2.0/valgrind/glib.supp ./test_starlink` A clean output from valgrind:

chris@capricorn:~/projects/FITSWEBQLSE/tests> valgrind --leak-check=full --show-leak-kinds=all --suppressions=/usr/share/glib-2.0/valgrind/glib.supp ./test_starlink
==13866== Memcheck, a memory error detector
==13866== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==13866== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==13866== Command: ./test_starlink
==13866==
==13866==
==13866== HEAP SUMMARY:
==13866==     in use at exit: 0 bytes in 0 blocks
==13866==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==13866==
==13866== All heap blocks were freed -- no leaks are possible
==13866==
==13866== For lists of detected and suppressed errors, rerun with: -s
==13866== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Next let's add astBegin and andEnd with nothing else:

int main()
{
    astBegin;
    astEnd;

    return 0;
}

There are now "still reachable" blocks:

chris@capricorn:~/projects/FITSWEBQLSE/tests> valgrind --leak-check=full --show-leak-kinds=all --suppressions=/usr/share/glib-2.0/valgrind/glib.supp ./test_starlink
==13948== Memcheck, a memory error detector
==13948== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==13948== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==13948== Command: ./test_starlink
==13948==
==13948==
==13948== HEAP SUMMARY:
==13948==     in use at exit: 139,768 bytes in 3 blocks
==13948==   total heap usage: 4 allocs, 1 frees, 139,796 bytes allocated
==13948==
==13948== 16 bytes in 1 blocks are still reachable in loss record 1 of 3
==13948==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==13948==    by 0x493F623: astGlobalsInit_ (globals.c:234)
==13948==    by 0x4BB1D94: astBegin_ (object.c:6601)
==13948==    by 0x401AEE: main (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==13948==
==13948== 32 bytes in 1 blocks are possibly lost in loss record 2 of 3
==13948==    at 0x484CC4C: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==13948==    by 0x4B87B94: astRealloc_ (memory.c:3744)
==13948==    by 0x4BB1D41: astBegin_ (object.c:6620)
==13948==    by 0x401AEE: main (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==13948==
==13948== 139,720 bytes in 1 blocks are still reachable in loss record 3 of 3
==13948==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==13948==    by 0x493F27F: astGlobalsInit_ (globals.c:138)
==13948==    by 0x4BB1D94: astBegin_ (object.c:6601)
==13948==    by 0x401AEE: main (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==13948==
==13948== LEAK SUMMARY:
==13948==    definitely lost: 0 bytes in 0 blocks
==13948==    indirectly lost: 0 bytes in 0 blocks
==13948==      possibly lost: 32 bytes in 1 blocks
==13948==    still reachable: 139,736 bytes in 2 blocks
==13948==         suppressed: 0 bytes in 0 blocks
==13948==
==13948== For lists of detected and suppressed errors, rerun with: -s
==13948== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

It gets progressively worse upon actually using AST to do something:

#include <stdlib.h>
#include <stdio.h>
#include <star/ast.h>

void test_fk5()
{
    double ra1, dec1, ra2, dec2;

    astBegin;

    AstSkyFrame *icrs = astSkyFrame("System=ICRS"); //,Epoch=2000,equinox=J2000");
    AstSkyFrame *fk5 = astSkyFrame("System=FK5");   //,Epoch=2000,equinox=J2000");

    AstFrameSet *icrs2fk5 = astConvert(icrs, fk5, " ");
    AstFrameSet *fk52icrs = astConvert(fk5, icrs, " ");

    // ds9 ra,dec in FK5
    ra1 = 52.2656215 * AST__DD2R;
    dec1 = 31.2677022 * AST__DD2R;

    printf("Original ra,dec (ds9) exported as FK5: %f %f\n", ra1 * AST__DR2D, dec1 * AST__DR2D);
    astTran2(fk52icrs, 1, &ra1, &dec1, 1, &ra2, &dec2);
    printf("AST FK5 --> ICRS: %f %f\n", ra2 * AST__DR2D, dec2 * AST__DR2D);

    printf("====================================================\n");

    // ds9 ra,dec in ICRS
    ra1 = 52.2656094 * AST__DD2R;
    dec1 = 31.2677078 * AST__DD2R;

    printf("Original ra,dec (ds9) exported as ICRS: %f %f\n", ra1 * AST__DR2D, dec1 * AST__DR2D);
    astTran2(icrs2fk5, 1, &ra1, &dec1, 1, &ra2, &dec2);
    printf("AST ICRS --> FK5: %f %f\n", ra2 * AST__DR2D, dec2 * AST__DR2D);

    // Clean up
    // not really needed as astEnd should do it
    astAnnul(icrs2fk5);
    astAnnul(fk52icrs);

    astAnnul(icrs);
    astAnnul(fk5);

    astEnd;
}

int main()
{
    astBegin;
    test_fk5();
    astEnd;

    return 0;
}

Valgrind reports plenty of memory errors. The list is rather long. What do you think? OS is 64-bit openSUSE Tumbleweed with gcc (SUSE Linux) 13.3.0.

(... prior entries omitted for brevity)
==14031== 240 bytes in 1 blocks are possibly lost in loss record 72 of 77
==14031==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==14031==    by 0x4B876AD: astMalloc_ (memory.c:2927)
==14031==    by 0x4BAFE44: astInitObject_ (object.c:5617)
==14031==    by 0x48A09F1: astInitAxis_ (axis.c:3215)
==14031==    by 0x48A0B61: astAxis_ (axis.c:3037)
==14031==    by 0x492570B: astInitFrame_ (frame.c:14921)
==14031==    by 0x4CF07C9: astInitTimeFrame_ (timeframe.c:7053)
==14031==    by 0x4CF0927: astTimeFrame_ (timeframe.c:6942)
==14031==    by 0x4CA8555: CalcLAST (skyframe.c:1294)
==14031==    by 0x4CAA010: SetLast.part.0 (skyframe.c:9340)
==14031==    by 0x4CAA072: SetLast (skyframe.c:9334)
==14031==    by 0x4CAA072: GetLAST (skyframe.c:3837)
==14031==    by 0x4CB2D93: MakeSkyMapping (skyframe.c:6065)
==14031==    by 0x4CB2D93: SubFrame (skyframe.c:10062)
==14031==
==14031== 240 bytes in 1 blocks are possibly lost in loss record 73 of 77
==14031==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==14031==    by 0x4B876AD: astMalloc_ (memory.c:2927)
==14031==    by 0x4BAFE44: astInitObject_ (object.c:5617)
==14031==    by 0x48A09F1: astInitAxis_ (axis.c:3215)
==14031==    by 0x48A0B61: astAxis_ (axis.c:3037)
==14031==    by 0x492570B: astInitFrame_ (frame.c:14921)
==14031==    by 0x4CF07C9: astInitTimeFrame_ (timeframe.c:7053)
==14031==    by 0x4CF0927: astTimeFrame_ (timeframe.c:6942)
==14031==    by 0x4CA856D: CalcLAST (skyframe.c:1295)
==14031==    by 0x4CAA010: SetLast.part.0 (skyframe.c:9340)
==14031==    by 0x4CAA072: SetLast (skyframe.c:9334)
==14031==    by 0x4CAA072: GetLAST (skyframe.c:3837)
==14031==    by 0x4CB2D93: MakeSkyMapping (skyframe.c:6065)
==14031==    by 0x4CB2D93: SubFrame (skyframe.c:10062)
==14031==
==14031== 248 bytes in 1 blocks are possibly lost in loss record 74 of 77
==14031==    at 0x484CC4C: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==14031==    by 0x4B87B94: astRealloc_ (memory.c:3744)
==14031==    by 0x4BB341F: astMakeId_ (object.c:7992)
==14031==    by 0x40162D: test_fk5 (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==14031==    by 0x401AF8: main (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==14031==
==14031== 360 bytes in 1 blocks are possibly lost in loss record 75 of 77
==14031==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==14031==    by 0x4B876AD: astMalloc_ (memory.c:2927)
==14031==    by 0x4BAFE44: astInitObject_ (object.c:5617)
==14031==    by 0x4B456B9: astInitMapping_ (mapping.c:23894)
==14031==    by 0x49255F6: astInitFrame_ (frame.c:14879)
==14031==    by 0x4CF07C9: astInitTimeFrame_ (timeframe.c:7053)
==14031==    by 0x4CF0927: astTimeFrame_ (timeframe.c:6942)
==14031==    by 0x4CA8555: CalcLAST (skyframe.c:1294)
==14031==    by 0x4CAA010: SetLast.part.0 (skyframe.c:9340)
==14031==    by 0x4CAA072: SetLast (skyframe.c:9334)
==14031==    by 0x4CAA072: GetLAST (skyframe.c:3837)
==14031==    by 0x4CB2D93: MakeSkyMapping (skyframe.c:6065)
==14031==    by 0x4CB2D93: SubFrame (skyframe.c:10062)
==14031==    by 0x4929C15: Match (frame.c:7332)
==14031==
==14031== 360 bytes in 1 blocks are possibly lost in loss record 76 of 77
==14031==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==14031==    by 0x4B876AD: astMalloc_ (memory.c:2927)
==14031==    by 0x4BAFE44: astInitObject_ (object.c:5617)
==14031==    by 0x4B456B9: astInitMapping_ (mapping.c:23894)
==14031==    by 0x49255F6: astInitFrame_ (frame.c:14879)
==14031==    by 0x4CF07C9: astInitTimeFrame_ (timeframe.c:7053)
==14031==    by 0x4CF0927: astTimeFrame_ (timeframe.c:6942)
==14031==    by 0x4CA856D: CalcLAST (skyframe.c:1295)
==14031==    by 0x4CAA010: SetLast.part.0 (skyframe.c:9340)
==14031==    by 0x4CAA072: SetLast (skyframe.c:9334)
==14031==    by 0x4CAA072: GetLAST (skyframe.c:3837)
==14031==    by 0x4CB2D93: MakeSkyMapping (skyframe.c:6065)
==14031==    by 0x4CB2D93: SubFrame (skyframe.c:10062)
==14031==    by 0x4929C15: Match (frame.c:7332)
==14031==
==14031== 139,720 bytes in 1 blocks are still reachable in loss record 77 of 77
==14031==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==14031==    by 0x493F27F: astGlobalsInit_ (globals.c:138)
==14031==    by 0x4BB1D94: astBegin_ (object.c:6601)
==14031==    by 0x401AEE: main (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==14031==
==14031== LEAK SUMMARY:
==14031==    definitely lost: 0 bytes in 0 blocks
==14031==    indirectly lost: 0 bytes in 0 blocks
==14031==      possibly lost: 4,456 bytes in 75 blocks
==14031==    still reachable: 139,736 bytes in 2 blocks
==14031==         suppressed: 0 bytes in 0 blocks
==14031==
==14031== For lists of detected and suppressed errors, rerun with: -s
==14031== ERROR SUMMARY: 75 errors from 75 contexts (suppressed: 0 from 0)
timj commented 3 months ago

My recollection is that we spent a lot of time on valgrind in the past and many of these possible leaks are not really leaks but are caused by AST having an internal memory pool that valgrind can't track. We had to use an internal malloc implementation because of all the small structs that are continually created and freed by AST.

This is why we have an explicit option to turn off the memory allocator if you want to use memory debugging tools

https://github.com/Starlink/ast/blob/master/configure.ac#L65-L72

jvo203 commented 3 months ago

Thank you for a prompt response. I'll try to use the --with-memdebug option and see what happens. Anyway, it very well could be that valgrind is not aware of the custom memory "hocus pocus" that AST does internally.

jvo203 commented 3 months ago

Hmm, I re-compiled the AST with ./configure --prefix=/usr/local --with-memdebug but valgrind still reports a problem...


int main()
{
    astBegin;
    astEnd;

    return 0;
}

chris@capricorn:~/projects/FITSWEBQLSE/tests> valgrind --leak-check=full --show-leak-kinds=all --suppressions=/usr/share/glib-2.0/valgrind/glib.supp ./test_starlink
==28384== Memcheck, a memory error detector
==28384== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==28384== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==28384== Command: ./test_starlink
==28384==
==28384==
==28384== HEAP SUMMARY:
==28384==     in use at exit: 139,840 bytes in 3 blocks
==28384==   total heap usage: 4 allocs, 1 frees, 139,940 bytes allocated
==28384==
==28384== 16 bytes in 1 blocks are still reachable in loss record 1 of 3
==28384==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==28384==    by 0x493DD43: astGlobalsInit_ (globals.c:234)
==28384==    by 0x4BB16C4: astBegin_ (object.c:6601)
==28384==    by 0x401AEE: main (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==28384==
==28384== 104 bytes in 1 blocks are still reachable in loss record 2 of 3
==28384==    at 0x484CC4C: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==28384==    by 0x4B876A1: astRealloc_ (memory.c:3744)
==28384==    by 0x4BB166D: astBegin_ (object.c:6620)
==28384==    by 0x401AEE: main (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==28384==
==28384== 139,720 bytes in 1 blocks are still reachable in loss record 3 of 3
==28384==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==28384==    by 0x493D99F: astGlobalsInit_ (globals.c:138)
==28384==    by 0x4BB16C4: astBegin_ (object.c:6601)
==28384==    by 0x401AEE: main (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==28384==
==28384== LEAK SUMMARY:
==28384==    definitely lost: 0 bytes in 0 blocks
==28384==    indirectly lost: 0 bytes in 0 blocks
==28384==      possibly lost: 0 bytes in 0 blocks
==28384==    still reachable: 139,840 bytes in 3 blocks
==28384==         suppressed: 0 bytes in 0 blocks
==28384==
==28384== For lists of detected and suppressed errors, rerun with: -s
==28384== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Calling test_fk5() results in an increase in the reachable memory + the usual "errors"...

==28445== 432 bytes in 1 blocks are still reachable in loss record 75 of 77
==28445==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==28445==    by 0x4B8705D: astMalloc_ (memory.c:2927)
==28445==    by 0x4BAF764: astInitObject_ (object.c:5617)
==28445==    by 0x4B43FE9: astInitMapping_ (mapping.c:23894)
==28445==    by 0x4923C56: astInitFrame_ (frame.c:14879)
==28445==    by 0x4CF1759: astInitTimeFrame_ (timeframe.c:7053)
==28445==    by 0x4CF18B7: astTimeFrame_ (timeframe.c:6942)
==28445==    by 0x4CA8FCD: CalcLAST (skyframe.c:1294)
==28445==    by 0x4CAAAC0: SetLast.part.0 (skyframe.c:9340)
==28445==    by 0x4CAAB22: SetLast (skyframe.c:9334)
==28445==    by 0x4CAAB22: GetLAST (skyframe.c:3837)
==28445==    by 0x4CB3893: MakeSkyMapping (skyframe.c:6065)
==28445==    by 0x4CB3893: SubFrame (skyframe.c:10062)
==28445==    by 0x4928275: Match (frame.c:7332)
==28445==
==28445== 432 bytes in 1 blocks are still reachable in loss record 76 of 77
==28445==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==28445==    by 0x4B8705D: astMalloc_ (memory.c:2927)
==28445==    by 0x4BAF764: astInitObject_ (object.c:5617)
==28445==    by 0x4B43FE9: astInitMapping_ (mapping.c:23894)
==28445==    by 0x4923C56: astInitFrame_ (frame.c:14879)
==28445==    by 0x4CF1759: astInitTimeFrame_ (timeframe.c:7053)
==28445==    by 0x4CF18B7: astTimeFrame_ (timeframe.c:6942)
==28445==    by 0x4CA8FE5: CalcLAST (skyframe.c:1295)
==28445==    by 0x4CAAAC0: SetLast.part.0 (skyframe.c:9340)
==28445==    by 0x4CAAB22: SetLast (skyframe.c:9334)
==28445==    by 0x4CAAB22: GetLAST (skyframe.c:3837)
==28445==    by 0x4CB3893: MakeSkyMapping (skyframe.c:6065)
==28445==    by 0x4CB3893: SubFrame (skyframe.c:10062)
==28445==    by 0x4928275: Match (frame.c:7332)
==28445==
==28445== 139,720 bytes in 1 blocks are still reachable in loss record 77 of 77
==28445==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==28445==    by 0x493D99F: astGlobalsInit_ (globals.c:138)
==28445==    by 0x4BB16C4: astBegin_ (object.c:6601)
==28445==    by 0x401AEE: main (in /mnt/data/chris/projects/FITSWEBQLSE/tests/test_starlink)
==28445==
==28445== LEAK SUMMARY:
==28445==    definitely lost: 0 bytes in 0 blocks
==28445==    indirectly lost: 0 bytes in 0 blocks
==28445==      possibly lost: 0 bytes in 0 blocks
==28445==    still reachable: 149,624 bytes in 77 blocks
==28445==         suppressed: 0 bytes in 0 blocks
==28445==
==28445== For lists of detected and suppressed errors, rerun with: -s
==28445== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I really do hope these are "phantom" problems caused by custom memory management by AST ...

dsberry commented 3 months ago

The --with-memdebug option only enables the internal AST memory debugging tools - it doesn't effect errors reported by valgrind in any way. As Tim says, AST allocates various memory blocks which it never releases. But these allocations should only happen once per activation of an AST-based application. So as the application becomes big (in terms of AST usage) the memory leaks shown by valgrind should not increases - significantly. Putting AST code inside a loop is a good way to test this. The internal AST memory debugging tools can be used to track down memory leaks within AST if it does look like the memory used by the application is increasing in proportion to the number of AST calls. To use them, configure --with-memdebug, and then include "astFlushMemory(1)" at the end of the application code. This will cause AST to report the internal memory identifiers for any allocated memory blocks that have not been released (it does not report blocks that are deemed as permanent memory blocks that AST never releases). The astFlushMemory function is defined and documented in source file src/memory.c.

jvo203 commented 3 months ago

Thank you. Indeed I ran a simple loop the other day and there was no increase in "still reachable" memory irrespective of the number of iterations. Valgrind always reported still reachable: 139,736 bytes in 2 blocks, which must be some kind of a permanent memory pool pre-allocated by AST.