Fixing the OCR on server-side.

BlueBrain / nat

Python module to use the annotations created with NeuroCurator, for example in a Jupyter notebook.

https://pypi.python.org/pypi/nat/

Other

8 stars 4 forks source link

Fixing the OCR on server-side. #19

Closed christian-oreilly closed 6 years ago

christian-oreilly commented 6 years ago

For some reasons, the behavior of ocrmypdf seem to have change. Whereas before we were expecting directly the .txt file from it, now it was generating a PDF with the ocr-ed text overlaid to it. This commit fix this issue by overwriting the original scan PDF with a pdf with text overlaid and run the usual pdftotext on this new PDF.