freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.58k stars 168 forks source link

Exception during conversion can make the process hang #791

Closed apyrgio closed 4 months ago

apyrgio commented 5 months ago

While performing our QA tests for 0.6.1 on Windows, we tested some success and error cases, to see how our termination logic (see #749 and #772) works for them. Below are the logs for each run:

Successful conversion
``` $ .\dev_scripts\dangerzone-cli .\tests\test_docs\sample_bmp.bmp Converting document to safe PDF [DEBUG] Marking doc 1648eq as 'converting' [INFO ] > 'C:\Program Files\Docker\Docker\resources\bin\docker.EXE' run --network none -u dangerzone --security-opt=no-new-privileges:true --cap-drop all --rm -i --name dangerzone-doc-to-pixels-1648eq dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels [INFO ] [doc 1648eq] 0% Converting page 1/1 to pixels [INFO ] [doc 1648eq] 49% Converted document to pixels [INFO ] Conversion output (doc to pixels) ----- DOC TO PIXELS LOG START ----- Converting page 1/1 to pixels Converted document to pixels ----- DOC TO PIXELS LOG END ----- [WARNING] Failed to kill container dangerzone-doc-to-pixels-1648eq: Command '['C:\\Program Files\\Docker\\Docker\\resources\\bin\\docker.EXE', 'kill', 'dangerzone-doc-to-pixels-1648eq']' returned non-zero exit status 1. [WARNING] Stdout from the kill command: b'' [WARNING] Stderr from the kill command: b'Error response from daemon: cannot kill container: dangerzone-doc-to-pixels-1648eq: No such container: dangerzone-doc-to-pixels-1648eq\n' [INFO ] > 'C:\Program Files\Docker\Docker\resources\bin\docker.EXE' run --network none -u dangerzone --security-opt=no-new-privileges:true --cap-drop all --rm -i --name dangerzone-pixels-to-pdf-1648eq -v 'C:\Users\apyrg\AppData\Local\Temp\tmpc_8gg2ys:/safezone:Z' -e OCR=0 -e OCR_LANGUAGE=None dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.pixels_to_pdf [INFO ] [doc 1648eq] 50% Converting page 1/1 from pixels to PDF [INFO ] [doc 1648eq] 100% Safe PDF created [INFO ] Conversion output: (pixels to PDF) ----- PIXELS TO PDF LOG START ----- ----- PIXELS TO PDF LOG END ----- [DEBUG] Marking doc 1648eq as 'safe' ```
Failed conversion by the conversion process
``` $ .\dev_scripts\dangerzone-cli .\tests\test_docs\sample_bad_pdf.pdf Converting document to safe PDF [DEBUG] Marking doc Bd5ntr as 'converting' [INFO ] > 'C:\Program Files\Docker\Docker\resources\bin\docker.EXE' run --network none -u dangerzone --security-opt=no-new-privileges:true --cap-drop all --rm -i --name dangerzone-doc-to-pixels-Bd5ntr dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels Exception: The document format is not supported [ERROR] [doc Bd5ntr] 0% The document format is not supported [DEBUG] Marking doc Bd5ntr as 'failed' Failed to convert document(s) ```
Failed conversion by the host
``` $ .\dev_scripts\dangerzone-cli .\tests\test_docs\sample_bad_max_width.pdf Converting document to safe PDF [DEBUG] Marking doc ervQpR as 'converting' [INFO ] > 'C:\Program Files\Docker\Docker\resources\bin\docker.EXE' run --network none -u dangerzone --security-opt=no-new-privileges:true --cap-drop all --rm -i --name dangerzone-doc-to-pixels-ervQpR dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels [INFO ] [doc ervQpR] 0% Converting page 1/1 to pixels [WARNING] Failed to kill container dangerzone-doc-to-pixels-ervQpR: Command '['C:\\Program Files\\Docker\\Docker\\resources\\bin\\docker.EXE', 'kill', 'dangerzone-doc-to-pixels-ervQpR']' returned non-zero exit status 1. [WARNING] Stdout from the kill command: b'' [WARNING] Stderr from the kill command: b'Error response from daemon: cannot kill container: dangerzone-doc-to-pixels-ervQpR: container 07a0b4524bc8f43f75ea1a3fca278fcc13598bff910f09216a3dffc176c24ddd is not running\n' [WARNING] Conversion process did not terminate gracefully after 15 seconds. Killing it forcefully... [ERROR] [doc ervQpR] 0% A page exceeded the maximum width. [DEBUG] Marking doc ervQpR as 'failed' Failed to convert document(s) ```

Two main things are of note here:

  1. In some error cases, our termination logic reports that the container does not exist, whereas the conversion process (docker run ...) still exists.
  2. Specifically for the failed conversion case, when that happens in the host, the conversion process seems to be stuck indefinitely, and we have to kill it.
apyrgio commented 5 months ago

Our original assumption was that if the container has exited, the conversion process should immediately exit as well, maybe with some small delay for some cleanups. So, we originally attributed this to a race condition.

We have noticed this happening in case of successful conversions, where indeed it's due to a race condition. If we wait a tiny bit before checking the conversion process' status, we will see that it has stopped.

Things are very different in the case of failed conversions due to exceptions in the host. In that case, what happens is:

  1. The process in the container calculates the number of pages and width/height of each page.
  2. The host process receives these calculations from stdout, and decides that a unit is out of bounds. In the case of the sample_bad_max_width.pdf document, it's the width.
  3. The host process raises an exception, and from its perspective, the conversion is over. At no point have we terminated the process running in the container!
  4. In the meantime, the container will continue working on the document, and write pixels to stdout.
  5. At some point, the container will exit or we will kill it with docker kill .... This doesn't matter much, but what matters here is that Docker is very careful with the container's stdout. It does not want to lose it, and thus the docker run ... process will block until we have read every byte from the container.
  6. Deadlock! Our termination logic does not read data from standard streams, nor does it terminate the docker process, so both process will remain stuck.
    • Thankfully, we force kill the conversion process after 15 seconds, so we do end up making progress.
apyrgio commented 5 months ago

Now that we know what's going on behind the scenes, let's answer the above questions:

  1. In some error cases, our termination logic reports that the container does not exist, whereas the conversion process (docker run ...) still exists.

That's because the conversion process may still linger, until the parent has read all the data that the container has written.

  1. Specifically for the failed conversion case, when that happens in the host, the conversion process seems to be stuck indefinitely, and we have to kill it.

The conversion process remains stuck, because the container has written lots of data to the stdout, and the host process has not read them.

apyrgio commented 5 months ago

How can we improve here? We can do the following:

  1. Unconditionally kill the container once the conversion is over, and don't check if kill succeeded, since it's racy.
    • Log a warning if the container still exists after kill, since this is not racy, and not expected.
  2. Gracefully terminate the conversion process after we terminate the container, since we at that point, we don't want to read its data.