freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.37k stars 153 forks source link

Switch to the new PyMuPDF implementation #742

Closed apyrgio closed 3 months ago

apyrgio commented 4 months ago

Since PyMuPDF 1.23.9, upstream has provided a new fitz implementation for PyMuPDF. We were hesitant to use it for two reasons:

  1. It was too late in our release to switch to a new PyMuPDF implementation, as we were concerned for any last minute bugs.
  2. PyMuPDF was writing some logs to stdout (see #700).

Now that 0.6.0 is out, we can switch to the new fitz implementation, once we silence some problematic calls.

Note that while experimenting with the new PyMuPDF implementation, we needed some helpers that were not available in Dangerzone (debug logs from second container, faster way to build images), so we add those in this PR.

Fixes #700