freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.49k stars 163 forks source link

Progress information for doc to PDF conversion #432

Open deeplow opened 1 year ago

deeplow commented 1 year ago

We currently don't have a way to see progress information when converting a libreoffice document to a PDF.

However, we may be able to to use libreoffice's page-range option to do it in batches of N pages.

deeplow commented 1 year ago

@apyrgio stated in a verbal discussion that this will probably introduce overhead since we are starting libreoffice multiple times.

A solution to that could be to use the unoserver, a tool that's made to solve this exact problem (it's available as a python package)

apyrgio commented 1 year ago

Nice find, and it makes sense to have it in mind for the #424. For the Qubes integration though, would it make sense to follow an iterative process, where we first tally the time LibreOffice takes to finish the conversion into the time it takes to separate the pages? Not accurate, I know, but it's a first step towards a working progress counter.

deeplow commented 1 year ago

For the Qubes integration though, would it make sense to follow an iterative process, where we first tally the time LibreOffice takes to finish the conversion into the time it takes to separate the pages? Not accurate, I know, but it's a first step towards a working progress counter.

Yes, we can start with what we already have (fixed % for this step of the process) and later change to protocol to accommodate more granular progress information.