coder / envbuilder

Build development environments from a Dockerfile on Docker, Kubernetes, and OpenShift. Enable developers to modify their development environment quickly.
Apache License 2.0
123 stars 24 forks source link

PoC: Build a tool to validate file hashes against layer caches #186

Closed mafredri closed 2 months ago

mafredri commented 3 months ago

This issue tracks the implementation of a PoC to validate the path forward for #128.

To support #185, we must be able to figure out if a cached layer image is valid, given the state of files relevant to building the container (think contents of Dockerfile, devcontainer.json, and hashes of any files pulled in via COPY-directive).

In this PoC, we will implement this logic (files -> layer) as a subcommand of envbulider. This will require parsing and understanding of Dockerfile to understand if a change modifies the outcome, and which layers remain intact.

mafredri commented 2 months ago

The conclusion from this PoC is that:

  1. It is possible to repeatedly produce the same final image hash without actually extracting cache layers and executing commands
  2. With a few changes, Kaniko can report the hash for directives in the Dockerfile
    • Example: A directive like COPY ./file /file consists of two hashes, the actual directive (COPY ./file /file) and the hash of the ./file. The hash we're referring to is a combination of these two.

The 1. solution is fairly straight forward and has been demonstrated in #213.

The 2. solution can be more flexible (doesn't require reproducible builds), but would need to develop a way to map hash + hash + hash -> final image. This would means tagging the image with a custom tag (perhaps a combination of all build layer hashes).