Closed TechReverie closed 3 weeks ago
Hey, thanks for opening a bug report, that certainly seem suspicious. We've already seen this, but I believe not the extent you're reporting now, and have an issue for this here: https://github.com/freedomofpress/dangerzone/issues/239
If that's possible for you (if some of the PDFs leading to these changes are shareable), would it be possible to send them to us at alexis@freedom.press
? (or attach it here if you feel like it)?
Hi, thanks for getting back to me so quickly.
I'm not sure due to copyright that I can share the exact files I tried this with, but they can be downloaded straight from the publisher here:- https://magpi.raspberrypi.com/issues, if that helps.
Attached image is a directory listing showing the before/after of converting issues 136,137, and 146.
I've had a further play with some of my old archived instruction manuals which show differing results, so I wonder if some sort of particular PDF format that may be bugging the software? Beyond my skills to know the difference between these so I've attached the originals for your perusal if that helps.
Note from the directory image that the freenas guide massively inflated, however the HD20, and P9657AA manuals reduced as expected/hoped.
P9657AA-Manual-EN-v1.0-090406.pdf HD20-M-en-GB.pdf
I cannot upload the 'safe' version of the converted freenas guide as it's over the upload file size limit.
If you need the converted versions of the other two I can upload those if you require, or if I can assist further do let me know.
Thank you.
Thanks for the link to the documents! I did a quick check and I can reproduce the size inflation you're noticing. However, I'm afraid it's kind of an expected side-effect of the way Dangerzone converts documents. The original file size does not affect the final file size, but the number of pages do.
You see, Dangerzone first renders each document page to pixels (RGB at 150 DPI), and then it reconstructs the document from said pixels. We did some measurements in https://github.com/freedomofpress/dangerzone/issues/526, and for typical A4 documents, each page should take about 6.22 MiB at 150 DPI. Let's see how this applies to your documents:
Document | Pages | Expected size (MiB) | Final size (MiB) |
---|---|---|---|
freenas9.2.1_guide.pdf | 280 | 1,741.6 | 89 |
MagPI 146 | 133 | 827.26 | 128 |
And here's where the compression comes into play. The table above tells us the following:
All in all, I think that Dangerzone can't do much better here, given the constraint that it has to convert pages to pixels. If your archiving method is doing something similar though, and you get better results, we'd like to know more.
In the meantime, I'll close this issue, but feel free to drop a comment.
What happened?
When I run a pdf file through Dangerzone the output file is huge compared to the original - for example 4MB --> 20MB, 50MB --> 282MB, 85MB --> 287MB.
I was under the impression that as part of the conversion the files were compressed. Did I get that wrong?
Linux distribution
Linuxmint 21.3 or Fedora 40 - both result in the same inflated results, and both inflate the files to the same file size
Dangerzone version
0.7.1
Podman info
No response
Document conversion logs
No response
Additional info
No response