freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.59k stars 170 forks source link

Upgrading Dangerzone 0.5.0 to 0.6.0 on Fedora 38 may / will break the OCR component #737

Closed deeplow closed 7 months ago

deeplow commented 7 months ago

Upgrading Dangerzone 0.5.0 to 0.6.0 on Fedora 38 may / will break the OCR component. This can be easily fixed by appending the line

    export TESSDATA_PREFIX=/usr/share/tesseract/tessdata

to the file .bash_profile in the disposable template used for the Dangerzone dispVM.

Originally posted by @GWeck in https://github.com/freedomofpress/dangerzone/issues/704#issuecomment-1974771165

deeplow commented 7 months ago

Great catch! We had tested on Qubes a dev build for Fedora 38 templates, and a production build for Fedora 39 templates. And yet we missed it :grimacing: . The reason we missed it is:

  1. The PyMuPDF version on Fedora 39 is 1.23.3, which accepts the Tesseract data path as a separate argument. Our code checks for the PyMuPDF version and does pass the correct path:

    https://github.com/freedomofpress/dangerzone/blob/d35eb56b4b53644bff78d4c79cf0cb6d08fae9c9/dangerzone/conversion/common.py#L24-L28

    https://github.com/freedomofpress/dangerzone/blob/d35eb56b4b53644bff78d4c79cf0cb6d08fae9c9/dangerzone/conversion/pixels_to_pdf.py#L60-L72

  2. The dev script on Qubes has this line, so that's why the issue did not manifest on our local tests:

    https://github.com/freedomofpress/dangerzone/blob/d35eb56b4b53644bff78d4c79cf0cb6d08fae9c9/dev_scripts/dangerzone#L7-L9

Originally posted by @apyrgio in https://github.com/freedomofpress/dangerzone/issues/704#issuecomment-1976114322

apyrgio commented 7 months ago

This issue has been fixed in our repo, but we also need to ship a new Dangerzone version for affected users. What we plan to do shortly is:

  1. Use the latest commit in main (f75d471ec8ddb7d795c751a9dbf876ac7b175931 as of writing this).

  2. Bump the release number from 1 to 2 in install/linux/dangerzone.spec:

    https://github.com/freedomofpress/dangerzone/blob/f75d471ec8ddb7d795c751a9dbf876ac7b175931/install/linux/dangerzone.spec#L36

  3. Build a dangerzone-qubes-0.6.0-2.fc38.x86_64.rpm package. That is, build an RPM only for Fedora 38 and Qubes.

  4. Publish this package in our yum-tools-prod repo, so that users who have installed 0.6.0-1 will get updated to 0.6.0-2.

  5. Create a v0.6.0-2 tag in the Dangerzone repo, as we did for the Fedora 37 hotfix in v0.4.0.

apyrgio commented 7 months ago

@GWeck we have a 0.6.0-2 release out for Fedora 38, that fixes this problem in particular. If you still have any issues with the new update, please let us know. Agains, thanks a lot for the bug report :slightly_smiling_face: