coder / envbuilder

Build development environments from a Dockerfile on Docker, Kubernetes, and OpenShift. Enable developers to modify their development environment quickly.
Apache License 2.0
123 stars 24 forks source link

PoC: Modify envbuilder to support build resumption from any layer #185

Closed mafredri closed 2 months ago

mafredri commented 3 months ago

This issue tracks the implementation of a PoC to validate the path forward for #128.

To better utilize the envbuilder cache, we want to extend envbuilder with support for build resumption from any previous layer so that the container runtime layer caching can be utilized to avoid file extraction overhead.

To put simply, if currently we do:

docker run -it --rm ghcr.io/coder/envbuilder:0.2.9
# build layer 1
# push ghcr.io/myorg/envbuilder-cache:aaa
# build layer 2
# push ghcr.io/myorg/envbuilder-cache:bbb
# build layer 3
# push ghcr.io/myorg/envbuilder-cache:ccc
# start

We want to be able to do this instead:

docker run -it --rm ghcr.io/myorg/envbuilder-cache:bbb
# resuming build from layer 2
# build new layer 3
# push ghcr.io/myorg/envbuilder-cache:ddd
# start

Note: In isolation, this may not improve performance. The container runtime should have previously (extracted) some or all of the cached layer that we're resuming from.

mafredri commented 3 months ago

Turns out this was not entirely trivial as cache layers uploaded by Kaniko can't be run as images. This does make sense as it's pretty much the same with Docker.

For now, we'll just enable uploading of the final image as per #197. This should allow us to start from the complete image once #186 is implemented, given that it's been built.

In future, it may be possible to allow bootstrapping from any layer by modifying Kaniko slightly to create/upload intermediate images to the registry. A quick hack that enabled layers to be run with docker run (this broke the final image, mind you) looked like this:

diff --git pkg/executor/build.go pkg/executor/build.go
index 8c1f353f..d2b6063d 100644
--- pkg/executor/build.go
+++ pkg/executor/build.go
@@ -416,6 +417,9 @@ func (s *stageBuilder) build() error {
                        if err := s.saveLayerToImage(layer, command.String()); err != nil {
                                return errors.Wrap(err, "failed to save layer")
                        }
+                       if err := s.opts.DoPush(s.image, s.opts); err != nil {
+                               return errors.Wrap(err, "failed to push layer")
+                       }
                } else {
                        tarPath, err := s.takeSnapshot(files, command.ShouldDetectDeletedFiles())
                        if err != nil {
@@ -441,6 +445,9 @@ func (s *stageBuilder) build() error {
                        if err := s.saveSnapshotToImage(command.String(), tarPath); err != nil {
                                return errors.Wrap(err, "failed to save snapshot to image")
                        }
+                       if err := s.opts.DoPush(s.image, s.opts); err != nil {
+                               return errors.Wrap(err, "failed to push layer")
+                       }
                }
        }
mafredri commented 2 months ago

The conclusion from this PoC is that:

  1. Resumption from build layer cache is not possible, per default. (Lacks data, like architecture, needed by Docker to run it.)
  2. It is possible to modify Kaniko to turn a cached build layer into a complete image, but more investigation into how to do it properly without affecting the final image is needed.
  3. When we enable pushing (#197), the complete image can be started but as-is can't be used to resume envbuilder startup
    • Can be fixed by 1) sanitizing the Docker image (USER root, etc) and bundling the envbuilder binary.