gen2brain / go-fitz

Golang wrapper for the MuPDF Fitz library
GNU Affero General Public License v3.0
367 stars 87 forks source link

Out of Memory issues with frequent requests #108

Closed conor-nsurely closed 1 day ago

conor-nsurely commented 1 week ago

Hello,

I know there is already a related issue , but it is from 5 years ago so perhaps something has changed.

` text := ""

doc, err := fitz.New(file)
if err != nil {
    return err
}
defer doc.Close()

// Extract pages as images and extract text
for n := 0; n < doc.NumPage(); n++ {
    img, err := doc.Image(n)
    if err != nil {
        return err
    }
`

Memory usage grows for each request(or pdf page I suppose) which isn't an issue, however it doesn't seem to release quickly enough leading to crashes when requests are frequent. Is there a way to force it to release resources? Is there anything else I need to add or should 'doc.Close()' handle all cleanup? I've tried removing everything except the above code block and it still occurs, so I don't think it's a different part of the application causing it.

This is leading to frequent crashing due to running out of memory even in a container with 8GiB allocated.

Otherwise the library is working great :)

Thanks

gen2brain commented 5 days ago

Close() should close and drop the document, context, and stream, i.e. it should clean all the memory. How large are PDFs? I am guessing that it is related to the context created with FZ_STORE_UNLIMITED hard-coded. You didn't show all your code, but it looks like you are doing more things, extracting text, etc. I know for example ImageMagick can use a LOT of memory when dealing with PDF and can be controlled with the -limit option, but here it is hard-coded.

You can try to change that to FZ_STORE_DEFAULT (256 << 20), or just set some value. I should probably allow optional parameters and fallback to the default value, unlimited is not a good idea.

conor-nsurely commented 3 days ago

@gen2brain Hey, Thanks for getting back to me.

The pdfs are about 2-3 MiB on average, largest being 7-8MiB. The pdf pages are converted to images and then sent to the Google vision api to perform OCR.

Once that is set will it clean up the old memory once it reaches the limit or what happens?

So in order to change FZ_STORE_UNLIMITED I would have to build the package myself right?

gen2brain commented 2 days ago

I have added the MaxStore global variable, so you can set the size you need, default is 256 << 20. You can check the header files for docs, i.e. https://github.com/gen2brain/go-fitz/blob/master/include/mupdf/fitz/context.h#L306.

conor-nsurely commented 1 day ago

Hey @gen2brain thanks for fixing this so quickly.

Do you know when this might be released?

gen2brain commented 1 day ago

When the new mupdf libraries are built, probably will not happen during summer, just use @latest for now.

conor-nsurely commented 1 day ago

Okay will do.

Thanks