Enhance `Convert to Trusted PDF` to output w/ searchable text

QubesOS / qubes-issues

The Qubes OS Project issue tracker

https://www.qubes-os.org/doc/issue-tracking/

541 stars 48 forks source link

Enhance `Convert to Trusted PDF` to output w/ searchable text #6181

Open Rspigler opened 4 years ago

Rspigler commented 4 years ago

Micah Lee has released Dangerzone, for the purpose of giving non-Qubes users access to the security benefit of Convert To Trusted PDF (although with Linux containers).

However, it has the added benefit that the input file can be any office document and the output still has searchable text. I think this would be a good enhancement.

Here is the code:

https://github.com/firstlookmedia/dangerzone-converter

iamahuman commented 4 years ago

dangerzone-converter uses OCR recognition provided by Tesseract, which wouldn't support every Unicode codepoints out of the box.

Porting this to qubes-app-linux-pdf-converter means an extra dependency, which by itself could entail other security issues, more or less.

Perhaps we could extract PDF text and its box coordinates in a safe format. Switching to GraphicsMagick (#5009) sounds desirable, though.

neowutran commented 4 years ago

Related to this https://github.com/QubesOS/qubes-app-linux-pdf-converter/pull/9 (office document + graphics magick. But not searchable text). But currently having issues with recent libreoffice version

Rspigler commented 4 years ago

I hadn't noticed your work. (Thank you!) I am going to edit now to just be an issue for searchable text.

Rspigler commented 4 years ago

Unless that is inherent in the design of only passing the bitmap back to the AppVM, and then this should be closed.

andrewdavidwong commented 4 years ago

Unless that is inherent in the design of only passing the bitmap back to the AppVM, and then this should be closed.

Hm, this is probably a security question for @marmarek.

deeplow commented 1 year ago

Micah Lee has released Dangerzone, for the purpose of giving non-Qubes users access to the security benefit of Convert To Trusted PDF (although with Linux containers).

And some years later its now getting back to its roots with active work to support Qubes as a platform. I made a post on the forum about it here