Filimoa / open-parse

Improved file parsing for LLM’s
https://filimoa.github.io/open-parse/
MIT License
2.34k stars 89 forks source link

Named temp directory never clears temp files #38

Closed bradfox2 closed 4 months ago

bradfox2 commented 4 months ago

Initial Checks

Description

Conversion to pymupdf via named temp files that are not deleted will run the disk out of space when on a long running job.

see: ../open-parse/src/openparse/pdf.py:128

Example Code

No response

Filimoa commented 4 months ago

Thanks for opening this, I will look into this asap.

bradfox2 commented 4 months ago

I didn't look through the code extensively as to if the temps are used later for bbox display - but creating the pymupdf object from bytestream works fine to keep files off disk. Performance is better too. I can open PR if you'd like.

Filimoa commented 4 months ago

The tempfile was always hacky - thanks for pointing out you can create it from a bytestream. Should be fixed in v0.5.5.