freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.49k stars 163 forks source link

Handle cases when LibreOffice hangs #878

Open apyrgio opened 1 month ago

apyrgio commented 1 month ago

When running Dangerzone against our large test set, we found that some files (e.g., fdo78883.docx and ofz21168-1.doc) make LibreOffice 7.6 hang.

We opened a bug report for these files, but until the underlying issue is solved, we need a way to detect such hangs, and stop the conversion.

apyrgio commented 1 month ago

Re-introducing timeouts for the whole document is a solution I'd personally like to avoid. They have bitten us a lot in the past (#749), they are arbitrary (documents with many pages lead to very large timeout times), and we have recently decided to ditch them altogether (#687).

What makes more sense to me is the following:

Some extra benefits of this approach:

eloquence commented 1 month ago

(Leaving unmilestoned for now given lower potential impact)