freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.39k stars 155 forks source link

Update container image independently #698

Open apyrgio opened 5 months ago

apyrgio commented 5 months ago

Background

As of version 0.5.1 Dangerzone ships a container image built into the application. Running the doc_to_pixels operation (also known as the untrusted sandbox) in an isolated container environment is what gives Dangerzone its security properties. However this container image currently also play a role in the pixels_to_pdf operation (which we can call the trusted sandbox) because it turns out that it's hard to have cross-platform tools. Tools like GraphicsMagic, Tesseract-OCR, GhostScript would be hard to package on Windows and MacOS.

The key difference is that on the untrusted sandbox, we assume that the document can be malicious and do harm there, and still be contained. On the second part, Dangerzone is only parsing pixels which is assumed to be trusted due to its very simple representation.

Pros & Cons

Bundling the Dangerzone container image with the application has some benefits:

  1. Users are always certain that the downloaded container image has been provided by the Dangerzone team. This is especially important since the container image is used in both stages of the document sanitization:
    • In the first stage, where the document is converted to pixels in an untrusted sandbox, and
    • in the second stage, where the pixels are reconstructed into a PDF in a trusted sandbox
  2. Airgapped users can easily update their installation by installing just a Dangerzone package.

However, it also has some big drawbacks:

  1. The resulting Dangerzone package is quite large (currently over 600MiB). This makes it prohibitive to include it in Debian repositories (see this Debian bug report)
  2. The container image does not receive timely updates (e.g., weekly). Instead, we need to release cross-platform packages for all the supported platforms, and we currently do this roughly every 2 months.

Suggested Change

Things will change once we implement on-host pixels to pdf conversion (https://github.com/freedomofpress/dangerzone/issues/625). This means that the trusted sandbox requirement will cease existing, and only the untrusted sandbox will remain. As a result, the contents of the container image will no longer be of interest to potential attackers, since they will not be used for any trusted operation.

This opens the door for providing timely updates to our users. As an example, we can:

  1. Build container images as part of our CI jobs, and push them to a container image repository.
  2. Let the Dangerzone application download the container image from that repository.