CordyJ / OpenTxl

TXL programming language compiler/interpreter
Other
18 stars 1 forks source link

Some memory leaks #30

Closed mingodad closed 5 months ago

mingodad commented 5 months ago

Testing txl with a small C program and the C18 grammar using valgrind it shows some memory leaks:

==15071== Memcheck, a memory error detector
==15071== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==15071== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==15071== Command: txl test-ast.c c.txl
==15071== 
int a;
int b = 3;
typedef int myint;

myint doIt (myint a)
{
    return a;
}

==15071== 
==15071== HEAP SUMMARY:
==15071==     in use at exit: 9,347,112 bytes in 10 blocks
==15071==   total heap usage: 45 allocs, 35 frees, 195,890,061 bytes allocated
==15071== 
==15071== 4,096 bytes in 1 blocks are definitely lost in loss record 1 of 9
==15071==    at 0x4C3306D: malloc (vg_replace_malloc.c:431)
==15071==    by 0x14BB7C: TL_TLI_TLIOF (TLI.c:45)
==15071==    by 0x11196C: options (txl.c:1670)
==15071==    by 0x12567F: TProg (txl.c:12196)
==15071==    by 0x109503: main (main.c:53)
==15071== 
==15071== 4,096 bytes in 1 blocks are definitely lost in loss record 2 of 9
==15071==    at 0x4C3306D: malloc (vg_replace_malloc.c:431)
==15071==    by 0x14BB7C: TL_TLI_TLIOF (TLI.c:45)
==15071==    by 0x115FDC: scanner_openFile (txl.c:6899)
==15071==    by 0x123F65: scanner_tokenize (txl.c:9687)
==15071==    by 0x1269EB: TProg (txl.c:12487)
==15071==    by 0x109503: main (main.c:53)
==15071== 
==15071== 4,096 bytes in 1 blocks are definitely lost in loss record 3 of 9
==15071==    at 0x4C3306D: malloc (vg_replace_malloc.c:431)
==15071==    by 0x14BB7C: TL_TLI_TLIOF (TLI.c:45)
==15071==    by 0x115FDC: scanner_openFile (txl.c:6899)
==15071==    by 0x123F65: scanner_tokenize (txl.c:9687)
==15071==    by 0x12755A: TProg (txl.c:12617)
==15071==    by 0x109503: main (main.c:53)
==15071== 
==15071== 8,192 bytes in 2 blocks are definitely lost in loss record 5 of 9
==15071==    at 0x4C3306D: malloc (vg_replace_malloc.c:431)
==15071==    by 0x14BB7C: TL_TLI_TLIOF (TLI.c:45)
==15071==    by 0x119324: scanner_PushInclude (txl.c:6811)
==15071==    by 0x12499A: scanner_tokenize (txl.c:9919)
==15071==    by 0x1269EB: TProg (txl.c:12487)
==15071==    by 0x109503: main (main.c:53)
==15071== 
==15071== LEAK SUMMARY:
==15071==    definitely lost: 20,480 bytes in 5 blocks
==15071==    indirectly lost: 0 bytes in 0 blocks
==15071==      possibly lost: 0 bytes in 0 blocks
==15071==    still reachable: 9,326,632 bytes in 5 blocks
==15071==         suppressed: 0 bytes in 0 blocks
==15071== Reachable blocks (those to which a pointer was found) are not shown.
==15071== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15071== 
==15071== For lists of detected and suppressed errors, rerun with: -s
==15071== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
mingodad commented 5 months ago

Also while trying to parse sqlite3.c:

==22157== Memcheck, a memory error detector
==22157== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==22157== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==22157== Command: txl -p -v -s 1000 sqlite3.c c.txl
==22157== 
[c.txl] : TXL0911W - (Warning) Stack limit less than recommended for TXL size (probable cause: shell stack limit)
[c.txl] : TXL0913I - (Information) Recursion stack limit reduced from 51200000 to 7340032 to fit
==22157== Warning: set address range perms: large range [0xa280040, 0x1c09a344) (undefined)
==22157== Warning: set address range perms: large range [0x1c09b040, 0x4bb8b850) (undefined)
OpenTxl v11.3.2 (1.4.24) (c) 2024 James R. Cordy and others
Bootstrapping TXL ... 
  ... used 401 trees and 251 kids.
Scanning the TXL program c.txl
Parsing the TXL program
  ... used 10693 trees and 15975 kids.
Making the input language grammar tree
  ... used 1296 trees and 1518 kids.
Making the rule table
  ... used 2 trees and 0 kids.
Scanning the input file sqlite3.c
==22157== Source and destination overlap in strcpy(0x519ea040, 0x51aea043)
==22157==    at 0x4C394C8: strcpy (vg_replace_strmem.c:560)
==22157==    by 0x1178CD: strcpy (string_fortified.h:90)
==22157==    by 0x1178CD: scanner_skipSeparators (txl.c:7449)
==22157==    by 0x1241AD: scanner_tokenize (txl.c:9698)
==22157==    by 0x12755A: TProg (txl.c:12617)
==22157==    by 0x109503: main (main.c:53)
==22157== 
CordyJ commented 5 months ago

It's not clear what the analyzer thinks is a leak. OpenTxl does not use dynamic memory allocation, because it is too slow. Instead, it mass allocates all of the memory for its major data structures once at the beginning of each run, and purposely never deallocates them before exit, which might be something that valgrind would complain about.

However, this one looks like the Turing+'s kernel file table, which is done in a similar way. The Turing+ compiler treats the C programming language as machine code, not a programming language.

CordyJ commented 5 months ago

The strcpy() overlap is intentional and optimizes speed and memory in the scanner. Everything about TXL's memory management has been highly optimized for speed, and cheats are used.

mingodad commented 5 months ago

The memory leak is from here : https://github.com/CordyJ/OpenTxl/blob/c289dec190558bd085de3e6a7e94196d731f9d93/src/tpluslib/TLI.c#L45

It's never released after been added to TL_filenames : https://github.com/CordyJ/OpenTxl/blob/c289dec190558bd085de3e6a7e94196d731f9d93/src/tpluslib/TLI.c#L47

CordyJ commented 5 months ago

The memory leak is from here :

Great, thank you.

CordyJ commented 5 months ago

It's never released after been added to TL_filenames :

Wait, how can it be released if it's still in use?

mingodad commented 5 months ago

At the end of parsing ?

CordyJ commented 5 months ago

At the end of parsing ?

It's in the implementation of Turing+, not in the TXL source code, so possibly it's not an issue for this repo, rather the Turing+ one. In the Turing+ kernel the file tables are still active until main exit I believe. Certainly we shouldn't be introducing changes that cross the boundary between TXL's source code and the Turing+ compiler's source code.

mingodad commented 5 months ago

I'm just trying it here src/tpluslib/TL.h now:

void TL_finalize ()
{
    /* Close all output streams, except stdin, stdout and stderr */
    int i;
    for (i=3; i<25; i++) {
        if (TL_files[i] != NULL) fclose (TL_files[i]);
    if (TL_filenames[i] != NULL) free(TL_filenames[i]);  ///<<<< Free the memory here
    }

    /* Release all heap memory */
    for (i=0; i<25; i++) {
        if (TL_mallocs[i] != NULL) free (TL_mallocs[i]);
    }

    /* Revert signal handlers */
    TL_RevertSignalHandlers ();
}
mingodad commented 5 months ago

In the end the place to put the fix was in src/tpluslib/UNIX*/cinterface.h and setting TL_filenames[i] = NULL; instead of TL_filenames[i] = "";:

------------------------------ src/tpluslib/TL.h ------------------------------
index 998ed63..37166de 100644
@@ -15,7 +15,7 @@
 /* Revised to be both 32- and 64-bit compatible - JRC 11.8.15 */
 /* Removed obsolete SYS5 signal handling logic - JRC 11.8.15 */
 /* Added missing implicit includes - JRC 11.8.15 */
- 
+
 /* Fixed bug in file table overflow handling -- JRC 21.8.95 */
 /* Changed preprocessor directives to be legal in old non-ANSI C -- JRC 14.19.95 */
 /* Added setting of quitCode to conform to new signal handling -- JRC 4.1.97 */
@@ -63,7 +63,7 @@ void TL_RevertSignalHandlers ()
         /* revert to default signal handlers */
 #ifdef BSD
         /* Version using BSD signals */
-        
+
         if (signal(SIGINT, SIG_IGN) != SIG_IGN) {
             (void) signal(SIGINT, SIG_DFL);
         }
@@ -123,8 +123,8 @@ void TL_SignalHandler (signalNo)
         /* call the Turing handler */
         TL_currentHandlerArea = TL_handlerArea;
         TL_currentHandlerArea->quitCode = signalNo;
-        if (TL_currentHandlerArea != &defaultHandlerArea) 
-            TL_handlerArea = TL_currentHandlerArea->old_handlerArea; 
+        if (TL_currentHandlerArea != &defaultHandlerArea)
+            TL_handlerArea = TL_currentHandlerArea->old_handlerArea;
         longjmp (TL_currentHandlerArea->quit_env, signalNo);
 #endif
     }
@@ -169,7 +169,7 @@ char **argv;
     TL_filemode[TL_nextfile] = WRITE_MODE;
     for (i=TL_nextfile+1; i<25; i++) {
         TL_files[i] = NULL;
-        TL_filenames[i] = "";
+        TL_filenames[i] = NULL;
     }

     /* initialize handlers */
@@ -208,7 +208,7 @@ void TL_finalize ()
     for (i=0; i<25; i++) {
         if (TL_mallocs[i] != NULL) free (TL_mallocs[i]);
     }
-    
+
     /* Revert signal handlers */
     TL_RevertSignalHandlers ();
 }

------------------------ src/tpluslib/UNIX/cinterface.h ------------------------
index 706a484..6250f0b 100644
@@ -8,7 +8,7 @@
  * (These definitions may have to change from machine to machine.)
  *
  */
- 
+
 /* Updated library naming conventions to T+ 6.0 standard - JRC 11.7.18 */
 /* Revised exception handling to be consistent with T+ 6.0 standard - JRC 11.7.18 */

@@ -128,7 +128,8 @@ extern int TL_TLI_lookahead;
 #define TL_TLI_TLICL(streamNo)  \
         fclose (TL_files [streamNo+2]), \
         TL_files [streamNo+2] = NULL, \
-        TL_filenames [streamNo+2] = ""
+        free(TL_filenames [streamNo+2]), \
+        TL_filenames [streamNo+2] = NULL
 #define TL_TLI_TLIGC(getWidth, getItem, getItemSize, streamNo) \
         *getItem = fgetc (TL_files[streamNo+2]); \
         if (*getItem == (unsigned char) EOF) *getItem = '\0'
@@ -149,7 +150,7 @@ extern int TL_TLI_lookahead;
 #define TL_TLI_TLIPF(putPrec, putWidth, putItem, streamNo) \
             fprintf (TL_files [streamNo+2], "%*.*f", putWidth, putPrec, putItem)
 #define TL_TLI_TLIPK(streamNo) \
-        fputc ('\n', TL_files [streamNo+2]) 
+        fputc ('\n', TL_files [streamNo+2])
 #define TL_TLI_TLIPS( putWidth, putItem, streamNo) \
             fprintf (TL_files [streamNo+2], "%-*s", putWidth, putItem)
 #define TL_TLI_TLIRE(readItem, itemSize, status, streamNo) \

----------------------- src/tpluslib/UNIX32/cinterface.h -----------------------
index 706a484..6250f0b 100644
@@ -8,7 +8,7 @@
  * (These definitions may have to change from machine to machine.)
  *
  */
- 
+
 /* Updated library naming conventions to T+ 6.0 standard - JRC 11.7.18 */
 /* Revised exception handling to be consistent with T+ 6.0 standard - JRC 11.7.18 */

@@ -128,7 +128,8 @@ extern int TL_TLI_lookahead;
 #define TL_TLI_TLICL(streamNo)  \
         fclose (TL_files [streamNo+2]), \
         TL_files [streamNo+2] = NULL, \
-        TL_filenames [streamNo+2] = ""
+        free(TL_filenames [streamNo+2]), \
+        TL_filenames [streamNo+2] = NULL
 #define TL_TLI_TLIGC(getWidth, getItem, getItemSize, streamNo) \
         *getItem = fgetc (TL_files[streamNo+2]); \
         if (*getItem == (unsigned char) EOF) *getItem = '\0'
@@ -149,7 +150,7 @@ extern int TL_TLI_lookahead;
 #define TL_TLI_TLIPF(putPrec, putWidth, putItem, streamNo) \
             fprintf (TL_files [streamNo+2], "%*.*f", putWidth, putPrec, putItem)
 #define TL_TLI_TLIPK(streamNo) \
-        fputc ('\n', TL_files [streamNo+2]) 
+        fputc ('\n', TL_files [streamNo+2])
 #define TL_TLI_TLIPS( putWidth, putItem, streamNo) \
             fprintf (TL_files [streamNo+2], "%-*s", putWidth, putItem)
 #define TL_TLI_TLIRE(readItem, itemSize, status, streamNo) \

----------------------- src/tpluslib/UNIX64/cinterface.h -----------------------
index 706a484..6250f0b 100644
@@ -8,7 +8,7 @@
  * (These definitions may have to change from machine to machine.)
  *
  */
- 
+
 /* Updated library naming conventions to T+ 6.0 standard - JRC 11.7.18 */
 /* Revised exception handling to be consistent with T+ 6.0 standard - JRC 11.7.18 */

@@ -128,7 +128,8 @@ extern int TL_TLI_lookahead;
 #define TL_TLI_TLICL(streamNo)  \
         fclose (TL_files [streamNo+2]), \
         TL_files [streamNo+2] = NULL, \
-        TL_filenames [streamNo+2] = ""
+        free(TL_filenames [streamNo+2]), \
+        TL_filenames [streamNo+2] = NULL
 #define TL_TLI_TLIGC(getWidth, getItem, getItemSize, streamNo) \
         *getItem = fgetc (TL_files[streamNo+2]); \
         if (*getItem == (unsigned char) EOF) *getItem = '\0'
@@ -149,7 +150,7 @@ extern int TL_TLI_lookahead;
 #define TL_TLI_TLIPF(putPrec, putWidth, putItem, streamNo) \
             fprintf (TL_files [streamNo+2], "%*.*f", putWidth, putPrec, putItem)
 #define TL_TLI_TLIPK(streamNo) \
-        fputc ('\n', TL_files [streamNo+2]) 
+        fputc ('\n', TL_files [streamNo+2])
 #define TL_TLI_TLIPS( putWidth, putItem, streamNo) \
             fprintf (TL_files [streamNo+2], "%-*s", putWidth, putItem)
 #define TL_TLI_TLIRE(readItem, itemSize, status, streamNo) \
CordyJ commented 5 months ago

Thanks, that's a good fix.

Just a reminder than anything in the ./csrc subdirectory is not source code, it is generated code and not for human consumption.

There is a C source code version of OpenTxl (OpenTxl-C) that is clean C99 source code, but the owners have not yet open sourced it.

mingodad commented 5 months ago

There is another place that leaks memory:

void TL_finalize ()
{
    /* Close all output streams, except stdin, stdout and stderr */
    int i;
    for (i=3; i<25; i++) {
        if (TL_files[i] != NULL) fclose (TL_files[i]);
    }

    /* Release all heap memory */
    for (i=0; i<TL_nextmalloc; i++) {  ///////<<<<<<!!!!!! here in src/tpluslib/TL.h we have 25 but should be TL_nextmalloc
        if (TL_mallocs[i] != NULL) free (TL_mallocs[i]);
    }

    /* Revert signal handlers */
    TL_RevertSignalHandlers ();
}
CordyJ commented 5 months ago

Thanks. In finalize() nothing actually matters since we're shutting down, but it should get it right.

But why should it be TL_nextmalloc ? As far as I can see, 25 is correct, it's used consistently throughout the code.