Further testing OCR - Githubissues

BlueBrain / nat

Python module to use the annotations created with NeuroCurator, for example in a Jupyter notebook.

https://pypi.python.org/pypi/nat/

Other

8 stars 4 forks source link

Further testing OCR #4

Open christian-oreilly opened 7 years ago

christian-oreilly commented 7 years ago

We previously encountered issues when performing OCR on some document (documented in https://github.com/jbarlow83/OCRmyPDF/issues/97). Since this issue has been closed by the developers of OCRmyPDF, we need to revisit this issue to check that OCR is now working as expected by the REST server.

pafonta commented 6 years ago

OCRmyPDF should be upgraded. After upgrade, the text file should be obtained with:

ocrmypdf --sidecar output.txt input.pdf output.pdf

--sidecar is a feature of v5.0:

Add a new feature, --sidecar, which allows creating “sidecar” text files which contain the OCR results in plain text. These OCR text is more reliable than extracting text from PDFs. Closes #126.