Open christian-oreilly opened 7 years ago
OCRmyPDF should be upgraded. After upgrade, the text file should be obtained with:
ocrmypdf --sidecar output.txt input.pdf output.pdf
--sidecar
is a feature of v5.0:
Add a new feature, --sidecar, which allows creating “sidecar” text files which contain the OCR results in plain text. These OCR text is more reliable than extracting text from PDFs. Closes #126.
We previously encountered issues when performing OCR on some document (documented in https://github.com/jbarlow83/OCRmyPDF/issues/97). Since this issue has been closed by the developers of OCRmyPDF, we need to revisit this issue to check that OCR is now working as expected by the REST server.