freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.59k stars 170 forks source link

Fix OCR bug in Qubes Fedora 38 templates #741

Closed apyrgio closed 7 months ago

apyrgio commented 7 months ago

Provide a fix for an OCR bug that affected Fedora 38 templates of Qubes OS. In that specific configuration, the PyMuPDF version accepts the Tesseract data directory only from the TESSDATA_PREFIX environment variable. Our mistake was that we were setting this environment variable in a dev script, instead of setting it for all configurations.

In this commit, we set an attribute in the fitz.fitz module, so that both dev scripts and end-user installations can work. This is hacky, but it targets an old PyMuPDF release after all, so we don't expect things to break in the long run.

Fixes #737