LangProc / langproc-cw

Compiler coursework repository for Instruction Architectures and Compilers module at Imperial College London
18 stars 22 forks source link

Memory leaks in compiler skeleton #21

Closed simon-staal closed 6 months ago

simon-staal commented 7 months ago

TLDR: (relevant for students doing their coursework)

Full story For context, I've been working on making some improvements to the skeleton compiler we're providing, and I came across some peculiar behaviour when verifying that I'm not leaking any memory.

For starters, the -fsanitize=address -static-libasan flags that we're compiling with seem to be quite bad at detecting leaks, as I don't think they provide any instrumentation into the parser and lexer generated files, which is where the majority of memory allocation is done (I confirmed this by intentionally adding memory leaks into the parser, which was completely undetected by the sanitizer). A better way to test is to use valgrind (and disable the aforementioned flags as they don't play nice with it), which produces the following output when run on the current version of main:

root@3ac80c3969e9:/workspaces/langproc-cw# valgrind ./bin/c_compiler -S compiler_tests/_example/example.c -o /bin/output/test.s
// ...
==60672== HEAP SUMMARY:
==60672==     in use at exit: 16,930 bytes in 4 blocks
==60672==   total heap usage: 46 allocs, 42 frees, 102,014 bytes allocated
==60672== 
==60672== LEAK SUMMARY:
==60672==    definitely lost: 0 bytes in 0 blocks
==60672==    indirectly lost: 0 bytes in 0 blocks
==60672==      possibly lost: 0 bytes in 0 blocks
==60672==    still reachable: 16,930 bytes in 4 blocks
==60672==         suppressed: 0 bytes in 0 blocks
==60672== Rerun with --leak-check=full to see details of leaked memory
==60672== 
==60672== For lists of detected and suppressed errors, rerun with: -s
==60672== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Taking a closer look at what exactly is being leaked (using --leak-check=full --show-leak-kinds=all), we can see:

==60833== 8 bytes in 1 blocks are still reachable in loss record 1 of 4
==60833==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==60833==    by 0x1295D6: yyalloc(unsigned long) (lexer.yy.cpp:2346)
==60833==    by 0x128D89: yyensure_buffer_stack() (lexer.yy.cpp:2045)
==60833==    by 0x125F8D: yylex() (lexer.yy.cpp:838)
==60833==    by 0x1238ED: yyparse() (parser.tab.cpp:1108)
==60833==    by 0x124B78: ParseAST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (parser.y:204)
==60833==    by 0x121FD1: Parse(CommandLineArguments&) (compiler.cpp:10)
==60833==    by 0x122794: main (compiler.cpp:51)
==60833== 
==60833== 64 bytes in 1 blocks are still reachable in loss record 2 of 4
==60833==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==60833==    by 0x1295D6: yyalloc(unsigned long) (lexer.yy.cpp:2346)
==60833==    by 0x1285C8: yy_create_buffer(_IO_FILE*, int) (lexer.yy.cpp:1884)
==60833==    by 0x125FC9: yylex() (lexer.yy.cpp:840)
==60833==    by 0x1238ED: yyparse() (parser.tab.cpp:1108)
==60833==    by 0x124B78: ParseAST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (parser.y:204)
==60833==    by 0x121FD1: Parse(CommandLineArguments&) (compiler.cpp:10)
==60833==    by 0x122794: main (compiler.cpp:51)
==60833== 
==60833== 472 bytes in 1 blocks are still reachable in loss record 3 of 4
==60833==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==60833==    by 0x4B2A64D: __fopen_internal (iofopen.c:65)
==60833==    by 0x4B2A64D: fopen@@GLIBC_2.2.5 (iofopen.c:86)
==60833==    by 0x124AB0: ParseAST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (parser.y:198)
==60833==    by 0x121FD1: Parse(CommandLineArguments&) (compiler.cpp:10)
==60833==    by 0x122794: main (compiler.cpp:51)
==60833== 
==60833== 16,386 bytes in 1 blocks are still reachable in loss record 4 of 4
==60833==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==60833==    by 0x1295D6: yyalloc(unsigned long) (lexer.yy.cpp:2346)
==60833==    by 0x128624: yy_create_buffer(_IO_FILE*, int) (lexer.yy.cpp:1893)
==60833==    by 0x125FC9: yylex() (lexer.yy.cpp:840)
==60833==    by 0x1238ED: yyparse() (parser.tab.cpp:1108)
==60833==    by 0x124B78: ParseAST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (parser.y:204)
==60833==    by 0x121FD1: Parse(CommandLineArguments&) (compiler.cpp:10)
==60833==    by 0x122794: main (compiler.cpp:51)

Long story short, 3 of these is caused by flex and the other one is caused by not fclose()ing yyin. My PR (pending write access to the repo) will include fixes for both of these - but until then worth being aware of this.

Jpnock commented 7 months ago

Thanks for the report. As for ASAN, I added it to be able to easily catch null pointer derefs, buffer overflows, etc., so having LSAN alongside it was just a freebie, even if it's not perfect.

Maybe try to see if you can reproduce by changing from -O0 to -O2 and testing with ASAN again, although even if this works it just makes debugging a bit more painful so wouldn't recommend.

simon-staal commented 7 months ago

Just checked, -O2 doesn't do anything in terms of catching the leaks.