DCGM / pero_ocr_web

BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Batch download of ALTO and TXT #71

Closed zabak closed 2 years ago

zabak commented 2 years ago

To finish what we started in #70, there should be a way to automatically download the finished OCR (ALTO and TXT) for all pages. If the OCR is not available (not finished etc.) it should return an error or rather a status message (something like "OCR is not finished yet"). Or perhaps have a separate status API call and separate download?

michal-hradis commented 2 years ago

Implemented for API access in 912b25d43dde04065d5e1b27514739a85968a0f6. The download calls do no check state of the document when exporting text and PAGE. ALTO returns with error code if OCR is not available. I don't think that this check is needed as we can't check for "manual correction DONE", anyway.

michal-hradis commented 2 years ago

Bulk ALTO download for normal users (NON API) can be done by script /user_scripts/download_document.py