Open Rspigler opened 4 years ago
dangerzone-converter uses OCR recognition provided by Tesseract, which wouldn't support every Unicode codepoints out of the box.
Porting this to qubes-app-linux-pdf-converter
means an extra dependency, which by itself could entail other security issues, more or less.
Perhaps we could extract PDF text and its box coordinates in a safe format. Switching to GraphicsMagick (#5009) sounds desirable, though.
Related to this https://github.com/QubesOS/qubes-app-linux-pdf-converter/pull/9 (office document + graphics magick. But not searchable text). But currently having issues with recent libreoffice version
I hadn't noticed your work. (Thank you!) I am going to edit now to just be an issue for searchable text.
Unless that is inherent in the design of only passing the bitmap back to the AppVM, and then this should be closed.
Unless that is inherent in the design of only passing the bitmap back to the AppVM, and then this should be closed.
Hm, this is probably a security question for @marmarek.
Micah Lee has released Dangerzone, for the purpose of giving non-Qubes users access to the security benefit of
Convert To Trusted PDF
(although with Linux containers).
And some years later its now getting back to its roots with active work to support Qubes as a platform. I made a post on the forum about it here
Micah Lee has released Dangerzone, for the purpose of giving non-Qubes users access to the security benefit of
Convert To Trusted PDF
(although with Linux containers).However, it has the added benefit that the input file can be any office document and the output still has searchable text. I think this would be a good enhancement.
Here is the code:
https://github.com/firstlookmedia/dangerzone-converter