freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.36k stars 153 forks source link

Unhandled exceptions occurring inside function passed to `executor.map` are swallowed #776

Open naglis opened 2 months ago

naglis commented 2 months ago

Currently, if an unhandled exception occurs inside convert_doc when running dangerzone-cli, the exception is swallowed (traceback is not logged, the exit code is not changed).

This manifests in tests with dummy conversion - e.g. if shutil.copy raises an exception (e.g. src and dst being the same path), the CLIResult is still reported as a success due to the swallowed exception. Here is a demo test case.

IIUC during actual (not dummy) execution it would hide potential exceptions occurring here or here.

I believe this is because the generator returned by executor.map is not consumed - from ThreadPoolExecutor docs:

If a fn call raises an exception, then that exception will be raised when its value is retrieved from the iterator.

Here is a simplified form showing the issue:

>>> from concurrent.futures import ThreadPoolExecutor
>>>
>>> def do_work(i):
...     print(f"do_work({i})")
...     1 / 0
...
>>> with ThreadPoolExecutor(max_workers=4) as executor:
...     _ = executor.map(do_work, range(4))
...
do_work(0)
do_work(1)
do_work(2)
do_work(3)
>>>
apyrgio commented 2 months ago

Thanks a lot for bringing this issue to our attention, and for writing a test case.

I agree that the generator result should be consumed in principle, but we catch and log all exceptions (as you've already pointed out) here. I have to dig deeper, but my bet would be that there's a different underlying cause :-/.

naglis commented 2 months ago

but we catch and log all exceptions (as you've already pointed out) here

FWIW, I've ran into this in CLI tests when dummy conversion is used. The Dummy isolation provider overrides the convert method and does not have any exception handling.

I do not have a test case when using e.g. the container isolation provider (which IIUC uses the convert method defined on the base isolation provider), but the two sections (here and here (come to think of it, it also applies to the except errors.ConverterProcException and except errors.ConversionException blocks as well)) I've mentioned could hide exceptions if an exception were to occur inside them, since those sections are not inside a try/except (this part is inside the except Exception block, but any potential exception coming from inside the except block is not handled (there is no exception handling in convert_doc) and would be swallowed by exception.map).

To simulate a swallowed exception when using the container provider, we would have to raise it explicitly. E.g. try adding 1 / 0 here (before the try block). Running dev_scripts/dangerzone-cli /tmp/sample.pdf then results in:

╭──────────────────────────╮
│           ▄██▄           │
│          ██████          │
│         ███▀▀▀██         │
│        ███   ████        │
│       ███   ██████       │
│      ███   ▀▀▀▀████      │
│     ███████  ▄██████     │
│    ███████ ▄█████████    │
│   ████████████████████   │
│    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀    │
│                          │
│    Dangerzone v0.6.0     │
│ https://dangerzone.rocks │
╰──────────────────────────╯
[INFO ] Assigning ID '0sHZEe' to doc '/tmp/sample.pdf'

Converting document to safe PDF
[DEBUG] Marking doc 0sHZEe as 'converting'
apyrgio commented 2 months ago

Got it. So we practically need a try: ... except Exception: log.exception at the top level, that would log any stray exceptions from the exception blocks you pointed out.

naglis commented 2 months ago

Indeed. IIUC this wrapping handler should also mark the document as failed upon exception (if it was not done already) to correctly report it at the end.

amnak613 commented 1 month ago

I'm interested in contributing to this

apyrgio commented 1 month ago

Cool! Hopefully this issue provides enough context. In a nutshell, we want to:

  1. Catch and report all stray conversion exceptions from IsolationProvider.convert.
  2. Mark the related document as failed.

Happy to answer any question you may have :slightly_smiling_face:

amnak613 commented 1 month ago

Setting up the environment might take a while because I'm busy right now