Closed deeplow closed 4 months ago
This is probably a release blocker since it will interfere with the GUI as well:
There's something that does not make sense to me: how come you're hitting this issue, given that the offending function is in the fitz_new
module, and not in the regular fitz
module that we copy into the container?
Can you give a bit more info about the environment you are seeing this at?
Also, I've opened an issue in the PyMuPDF repo: https://github.com/pymupdf/PyMuPDF/issues/3135
There's something that does not make sense to me: how come you're hitting this issue, given that the offending function is in the fitz_new module, and not in the regular fitz module that we copy into the container?
I was manually looking for the __init__.py
so I probably find the wrong one. But now looking at the src_classic
, __init__.py
is much smaller. So I don't think it's that init they're talking about.
I am running this on Fedora 38 on Qubes. I'll try another environment to see if there are any differences.
And thanks for filing the upstream issue!
Curious that this happened, especially in a point release. Here they have explained that:
The best way to protect against this happening is to try out the rebased implementation in stage 1 by using import fitz_new as fitz, and report any problems so they can be fixed.
But if there isn't even a major version change how are users supposed know to check that there is a migration in progress? When users notice things is when things break, which is what happened here...
Per discussion with @apyrgio we'll be trying to stick with fitz_old
at least for this release due to this very issue.
The following may be a working solution:
diff --git a/Dockerfile b/Dockerfile
index 4b794b2d..b7e53091 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -61,7 +61,8 @@ RUN apk --no-cache -U upgrade && \
py3-magic \
font-noto-cjk
-COPY --from=pymupdf-build /usr/lib/python3.11/site-packages/fitz/ /usr/lib/python3.11/site-packages/fitz
+COPY --from=pymupdf-build /usr/lib/python3.11/site-packages/fitz /usr/lib/python3.11/site-packages/fitz
+COPY --from=pymupdf-build /usr/lib/python3.11/site-packages/fitz_old /usr/lib/python3.11/site-packages/fitz_old
COPY --from=tessdata-dl /usr/share/tessdata/ /usr/share/tessdata
COPY --from=h2orestart-dl /libreoffice_ext/ /libreoffice_ext
diff --git a/dangerzone/conversion/doc_to_pixels.py b/dangerzone/conversion/doc_to_pixels.py
index 5f1abf3b..2bce0cf2 100644
--- a/dangerzone/conversion/doc_to_pixels.py
+++ b/dangerzone/conversion/doc_to_pixels.py
@@ -7,7 +7,10 @@ import shutil
import sys
from typing import Dict, List, Optional, TextIO
-import fitz
+try:
+ import fitz_old as fitz
+except:
+ import fitz
import magic
from . import errors
diff --git a/dangerzone/conversion/pixels_to_pdf.py b/dangerzone/conversion/pixels_to_pdf.py
index 0243858d..e134680f 100644
--- a/dangerzone/conversion/pixels_to_pdf.py
+++ b/dangerzone/conversion/pixels_to_pdf.py
@@ -25,7 +25,10 @@ class PixelsToPDF(DangerzoneConverter):
tempdir = "/safezone"
# XXX lazy loading of fitz module to avoid import issues on non-Qubes systems
- import fitz
+ try:
+ import fitz_old as fitz
+ except:
+ import fitz
num_pages = len(glob.glob(f"{tempdir}/pixels/page-*.rgb"))
total_size = 0.0
Note: do need to keep both fitz_old
and fitz
in the container image because if we rename the fitz_old
module as fitz
in the container image, this line fails since fitz_old
is underfined. Even with a symlink fitz_old -> fitz
doesn't help here.
@apyrgio what do you think? This solution works both in containers as well as in Qubes and it only adds 22MB to the container image (when compressed).
There was another suggestion, of pinning PyMuPDF to 1.23.8 (now currently at 1.23.21), at least for this release. I looked into the PyMuPDF source, and I don't see any significant code changes in the src_classic/
directory of PyMuPDF since 1.23.8, so we should be good:
While converting a document on the page streaming PR I have just noticed the following:
Here's the line in question. It turns out that PyMuPDF's logging simply prints the error. Fortunately I haven't seen this in the pixels_to_PDF, but if it does happen it'll interfere with out page streaming since it assumes all stdout is pixel data.