OCR-D / ocrd_anybaseocr

DFKI Layout Detection for OCR-D
Apache License 2.0
48 stars 12 forks source link

OOM in cropper #95

Open bertsky opened 2 years ago

bertsky commented 2 years ago

On a workspace with >500 pages, running the cropper yields a

OSError: [Errno 12] Cannot allocate memory

This happens after VSZ (virtual memory) exceeds 32 GB. In contrast, RSS (resident memory) is still as low as 200 MB.

Could this be a leak in the LSD CPython module, @kba?

kba commented 2 years ago

Could this be a leak in the LSD CPython module, @kba?

Totally possible. I did not do any work on pylsd beyond getting it to work as a dependency and publishing to PyPI.

bertsky commented 2 years ago

The only workaround ATM is to process smaller page ranges. But unless you use numerical page IDs, this will be quite difficult with the OCRD CLI. (The problem being find_files does not support regex search for pageId …)

bertsky commented 2 years ago

(The problem being find_files does not support regex search for pageId …)

see https://github.com/OCR-D/core/issues/855