Apparent memory issue detected from python's CFFI, traced to lept

chrisdonlan commented 2 years ago

Detecting this problem has been a bit of detective work. I started from within python3 and wrapped liblept for use with libtesseract in a python3 C-Foreign-Function-Interface.

Background context:

From within python3, I detected a perfect memory leak (100% of data or even more) pushed into the tesseract/lept interface failed to get freed regardless of code changes.

I isolated the python3 interface first to a bare minimum C-interface call, which involved libtesseract and liblept working together. The memleak was present, with plateaus. Either cffi, libtesseract or liblept had a leak, or I was misusing one of the two libraries in cffi such that there was a leak due to my own error.

In order to rule out libtesseract to at least some extent, I set up a C++ api example and tested it for leakage. There was no leakage even though liblept was used as part of this test.

With that done, I dialed in on liblept as the possible source of leakage. When isolating solely the liblept calls, I was able to reproduce a perfect memory leakage with no plateaus.

At this point, I figured that the error was most likely in cffi, since cffi is the most complex component (it bridges two languages), and I posted a bug on cffi, CFFI Issue #527.

Current Issue

Per the advice of one of the developers there, I set up a C code to duplicate the python3 minimal example of leakage.

Between these two codes, there is some flaw... and it is in either liblept, cffi, or (more unlikely, imo) in one of the supporting python3 libraries -- but this does not seem to be the case.

Starting from the Python3 side, I'll try to pinch the issue between working code. Most likely it falls to either lept or cffi, but it is still not clear to me where the bug is exactly. I'll develop my position while writing this issue -- hopefully I'll get some clear information from valgrind on the C-side.

Starting with the detection of the problem in `python3`:

import gc

import click
import cv2
import numpy
from cffi import FFI
from PIL import Image as ImageModule
from PIL.Image import Image

DEFAULT_DPI = 300

def read_img(img_file: str):
    img = cv2.imread(img_file)
    # favoring numpy's memory management over opencv-python
    img_arr = numpy.copy(img)
    del img
    return img_arr

def cffi_memtest_pix(img: numpy.ndarray, cycles: int = 1):
    """Memory test the image cffi transaction with a standalone cffi interface.

    Run this for an arbitrary number of cycles in a docker container while monitoring the image's memory to confirm
    leakage.

    Args:
        img: the image to test
        cycles: the number of times to load and delete images
    """

    hdr = """
    typedef struct Pix Pix;
    char * getLeptonicaVersion();
    Pix * pixRead(const char* filename);
    Pix * pixCreate(int cols, int rows, int channels);
    Pix * pixSetData(Pix* pix, unsigned int * buffer);
    int   pixSetResolution(Pix * pix, int xres, int yres);
    void pixDestroy(Pix ** pix);
    Pix * pixEndianByteSwap(Pix * pix);
    """
    global DEFAULT_DPI
    ffi = FFI(backend=None)
    ffi.cdef(hdr)
    lept = ffi.dlopen('lept')

    for i in range(cycles):
        pil: Image = ImageModule.fromarray(img).convert("RGBA")
        cols, rows = pil.size
        pil_img_buf = pil.tobytes("raw", "RGBA")

        pix = lept.pixCreate(cols, rows, 32)
        lept.pixSetData(pix, ffi.from_buffer("unsigned int[]", pil_img_buf))

        # disabled for minimal test
        # lept.pixSetResolution(pix, DEFAULT_DPI, DEFAULT_DPI)
        pix = lept.pixEndianByteSwap(pix)  # no-op on big-endian, fixes the problem on little-endian

        _pix = ffi.new("Pix **")
        _pix[0] = pix
        lept.pixDestroy(_pix)
        ffi.release(_pix)

        del pix
        del _pix
        gc.collect()

    ffi.dlclose(lept)

def py_memtest_pix(img: numpy.ndarray, cycles: int = 1):
    """Memory test the python component of the image cffi transaction with a standalone cffi interface.

    Run this for an arbitrary number of cycles in a docker container while monitoring the image's memory to confirm
    leakage.

    Args:
        img: the image to test
        cycles: the number of times to load and delete images
    """

    hdr = """
    typedef struct Pix Pix;
    char * getLeptonicaVersion();
    Pix * pixRead(const char* filename);
    Pix * pixCreate(int cols, int rows, int channels);
    Pix * pixSetData(Pix* pix, unsigned int * buffer);
    int   pixSetResolution(Pix * pix, int xres, int yres);
    void pixDestroy(Pix ** pix);
    Pix * pixEndianByteSwap(Pix * pix);
    """
    global DEFAULT_DPI
    ffi = FFI(backend=None)
    ffi.cdef(hdr)
    lept = ffi.dlopen('lept')

    for i in range(cycles):
        pil: Image = ImageModule.fromarray(img).convert("RGBA")
        pil_img_buf = pil.tobytes("raw", "RGBA")
        gc.collect()

    ffi.dlclose(lept)

@click.command()
@click.argument('img', type=str)
@click.argument('cycles', type=int)
def run(img: str, cycles: int):
    img_ndarr = read_img(img)
    cffi_memtest_pix(img_ndarr, cycles)

if __name__ == "__main__":
    run()

My python3 memory test procedure drops valgrind, since I found it to be problematic for benchmarking python3 memory usage. Instead, I ran python3 code inside of an python3 eval loop running in a docker container. I plotted the docker container's memory usage per second in GNU-plot. X-axis is time in seconds, Y-axis is memory usage in GB.

Python3-only memtest result (PIL + numpy/opencv)

Screen Shot 2022-01-31 at 10 55 41 AM

Python3 with CFFI(lept)

Screen Shot 2022-01-31 at 10 59 11 AM

C-side analysis (in progress, checking in this issue's content so far):

I'll be starting with this code and I will try to reproduce the leakage. If the leak is not in this library, I'll use this issue as a reference for my issue in cffi:

#include "leptonica/allheaders.h"

int main(){

    struct Pix * r_pix = pixRead("/.../my-large-image.png");

    for (int i = 0; i < 1000; i++){

        // Pix seems to delete the data associated with the original pix when it is set to a different pix
        // so this avoids segfault
        struct Pix * r_pix_cp = pixCopy(NULL, r_pix);
        l_uint32 * pix_data = pixExtractData(r_pix_cp);

        // analog for python3 conversion from PIL
        struct Pix * pix = pixCreate(r_pix->w, r_pix->h, r_pix->d);
        pixSetData(pix, pix_data);
        pixEndianByteSwap(pix);
        pixDestroy(&pix);

        // free the rest of the copied struct
        pixDestroy(&r_pix_cp);
    };

    // free the original pix
    pixDestroy(&r_pix);
}

DanBloomberg commented 2 years ago

Thank you for bringing this issue to my attention.

The leak is in the code, and it can be argued that it is my design that is at fault! It is unlikely that you are the first person to be tripped up by it.

You have undoubtedly looked at the documentation in pix1.c for pixExtractData() and pixSetData(). There is also a fairly long discussion near the top of pix1.c in the section "Direct manipulation of the pix data field."

The design choice I made, which in retrospect seems like a poor one, is to have pixSetData() reassign the pointer to the new heap data, without freeing any existing data. Instead, I just put a note that you should use pixFreeData() first.

In your code, pixCreate() allocates the raster data. pixSetData() then leaks it. You could instead have used pixCreateHeader(), which doesn't allocated raster data, followed by pixSetData().

So you have 2 ways to fix the code.

Meanwhile, I am going to change pixSetData() and pixExtractData() to take this burden off the programmer.

chrisdonlan commented 2 years ago

No problem. I made this project for debugging:

pix construct/destruct loop

And I'll see if I can implement your advice!

chrisdonlan commented 2 years ago

Ah, got it working with C.

So the original python api usage is available on the internet in several locations as a way to convert a PIL.Image.Image to Pix and use it in Pix.

I started there, and it leaks -- prompting this traceback and issue report.

My error free C-code for construction and deletion from a buffer is:

        struct Pix * pix = pixCreateHeader(r_pix->w, r_pix->h, r_pix->d);
        pixSetData(pix, pix_data);
        pixEndianByteSwap(pix);
        pixDestroy(&pix);

// OR

        struct Pix * pix = pixCreateHeader(r_pix->w, r_pix->h, r_pix->d);
        pixSetData(pix, pix_data);

        struct Pix * pix2 = pixEndianByteSwapNew(pix);
        pixDestroy(&pix);
        pixDestroy(&pix2);

In python, the only PIL.Image.Image -> Pix transition I've been able to get working is the latter method (in the code below, all python self referenced methods (self.pixDestroy, for example) are evaluated directly as the C function).

        pil: Image = ImageModule.fromarray(img).convert("RGBA")
        cols, rows = pil.size
        pil_img_buf = pil.tobytes("raw", "RGBA")
        pix = self.pixCreateHeader(cols, rows, 32)  # the handle '_' indicates ptr, cannot be manipulated directly
        self.pixSetData(pix, self.ffi.from_buffer("unsigned int[]", pil_img_buf))
        self.pixSetResolution(pix, dpi, dpi)
        pix2 = self.pixEndianByteSwapNew(pix)  # no-op on big-endian, fixes the problem on little-endian

There is one problem: I cannot call pixDestroy on the first pix, pix. I can only destroy pix2. A call to destroy pix (prior to the internal copy operation) yields the following signal:

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

I suspect this relates to the change of ownership of the python buffer used for the first pix, and whether it is copied or moved, and under these conditions there is always a memory leak since I can never free the first pix created at the python -> c interface.

Assuming python is retaining some ownership of the memory, do you know of any work-arounds or causes that might enable the image buffer to transition onto the Pix side more completely?

DanBloomberg commented 2 years ago

I'm sorry. I don't know how python handles memory in general, or with heap data allocated by C in particular.

It seems that the SIGABRT happens when trying to free memory at the pix->data address that you have separately allocated and then assigned to that location. But the python memory allocator may be wrapping that data in a "smart pointer", something that the free() in pixDestroy() wouldn't know about. The smart pointer may even be on the stack!

chrisdonlan commented 2 years ago

Thank you! It sounds like one of these integration knots that needs to be untangled. Either way, the leak is vastly reduced now.

chrisdonlan commented 2 years ago

To fix the last little bit -- I am not detecting any "leakage" from the python3 managed memory -- I was able to write a header-destructor a-la-cart after looking at the pixDestroy function:

        if header_only:
            # data is not freed in this function; only the data housing is freed.
            # This is useful if python has retained ownership or wrapped memory in a smart pointer. If the 'data' has
            # a smart pointer or similar mechanism attached to it, there will be a system error that cannot be caught.
            # Avoid this situation by deleting the header only.
            txt = self.pixGetText(pix)
            if txt != self.ffi.NULL:
                self.free(txt)
            self.pixDestroyColormap(pix)
            self.free(pix)

I guess this is almost the opposite of pixCreateHeader. Maybe it is only useful in this circumstance though...

DanBloomberg commented 2 years ago

Looks good. Do you know any large python projects that link libleptonica and convert PIL images to Pix in order to use leptonica operations from python?

Just FYI, I decided not to change the implementation of pixSetData(). I was convinced by the argument in pix1.c:

 *  Memory management of the (image) data field in the pix is
 *  handled differently from that in the colormap or text fields.
 *  For colormap and text, the functions pixSetColormap() and
 *  pixSetText() remove the existing heap data and insert the
 *  new data.  For the image data, pixSetData() just reassigns the
 *  data field; any existing data will be lost if there isn't
 *  another handle for it.
 *
 *  Why is pixSetData() limited in this way?  Because the image
 *  data can be very large, we need flexible ways to handle it,
 *  particularly when you want to re-use the data in a different
 *  context without making a copy.

But to help users avoid mistakes, I'll add a new function pixFreeAndSetData(), and use that everywhere in the code where the pixFreeData() is called before pixSetData().

DanBloomberg / leptonica