devcontainers / spec

Development Containers: Use a container as a full-featured development environment.
https://containers.dev
Creative Commons Attribution 4.0 International
3.29k stars 201 forks source link

Integrate with cache mount mechanism to improve devcontainer build performance #345

Open shikanime opened 9 months ago

shikanime commented 9 months ago

Description

This proposal aims to integrate with cache mount caching mechanisms to enhance the performance of devcontainer builds. Rebuilding devcontainers frequently is a common practice due to various factors, such as frequently working on different projects, upgrading tool version or editing devcontainer specification. To address this issue, Buildkit introduced the RUN --mount feature to fix practice such as apk add --no-cache or rm -rf /var/cache/apt/archives /var/lib/apt/lists/*, which is actually utilized by the devcontainer building script for mounting features scripts. Exposing an API for features to leverage the cache mount would be beneficial for caching directories like /var/cache/apt/archives.

Motivation

Building containers can be a resource-intensive process, both in terms of compute and network resources. A notable example is installing home-manager in a container where a significant amount of developer experience programs are shared, such as oh my zsh configurations, custom shells, and versioning tooling. All of these contributions can increase the container size by gigabytes. The only known solution to this issue is to move the some steps towards hooks, as demonstrated in my script and Ken Muse's article. This approach allows for offloading the build task to hooks and utilizing mounts.

Proposed Solution

To address the aforementioned concerns, I propose introducing a new configuration option in the specification to enable the configuration of one or more mount type caches such as:

{
  "build": {
    "mounts": [
      {"type": "cache", "id": "apt-cache", "target": "/var/cache/apt/archives" }
    ]
  }
}

Implementation Challenges

While this proposal addresses the integration of caching mechanisms for devcontainer builds, it doesn't encompass solutions for user relative cache directories like local $HOME/.cache/pip directories under user home paths. It primarily solve global caching mechanisms, such as /var/cache.

Furthermore, the distinction between runtime and build-time caching should be carefully considered. Installing dependencies during the install.sh phase allows for immediate access to those dependencies for dependent features, while utilizing hooks enables caching to be shared with the user's runtime environment.

chrmarti commented 8 months ago

Thanks for bringing this up, this sounds like a good idea!

I guess, this would be an addition to the devcontainer-feature.json only and not the devcontainer.json as that can use the Dockerfile to do the build-time mounts.

On built vs runtime caching: Not keeping the caches in the image makes a lot of sense in general since that keeps the image small for transferring to/from a registry. If the user would want to have the cache in the dev container, maybe that could be bind/volume mounted or copied when creating the container.

I see the other type would be "bind", are there specific usages for that?

shikanime commented 8 months ago

In the build context, bind is primarily used to manage short-lived configuration files such as the source directory, requirements.txt, go.mod/go.sum, or Cargo.toml/Cargo.lock. These files typically do not need to persist beyond the RUN operation, making bind a suitable option for handling them.

While integrating bind into the Devcontainer environment is not strictly necessary as the current implementation already mounts the feature directory during the build, it could provide flexibility for use cases such as configuring OpenSSL by mounting a configuration to /etc/ssl/openssl.cnf. However, the use case remains relatively niche. On the other hand, ssh and secret are essential tools for managing private source repositories.

Anyway, implementing it as a generic specification like the devcontainer spec for mounts would unlock all features. Although I don't have the source code of the VSCode extension to dig into how the devcontainer artifacts generation is done.

chrmarti commented 8 months ago

This appears to be a BuildKit feature. Docker Desktop comes with BuildKit preinstalled. The only issue might be Linux installs where BuildKit comes in a separate package (e.g., docker-buildx-plugin in Debian/Ubuntu) and might not be installed by default. We could make it mandatory though, this would also allow us to remove some compatibility code that deals with installs missing BuildKit. Podman is using buildah which seems to support cache mounts.

chrmarti commented 8 months ago

According to the docs APT requires sharing=locked and a config change to make this work: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/reference.md#example-cache-apt-packages

The example from the above link:

# syntax=docker/dockerfile:1
FROM ubuntu
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
  --mount=type=cache,target=/var/lib/apt,sharing=locked \
  apt update && apt-get --no-install-recommends install -y gcc
chrmarti commented 8 months ago

A few more things to think about:

We could try to address these in each feature using the cache mounts:

The advantage of having each participating feature deal with these would be the simplicity of the proposal.

chrmarti commented 8 months ago

Another option to speed up APT specifically is to configure a caching proxy like https://wiki.debian.org/AptCacherNg. This might be possible in a way that is transparent to features and without amending the spec.

shikanime commented 8 months ago

This is something I was also thinking about, similarly in the nix ecosystem we have cachix which is a remote build cache and bazel also has his own. At some point this spec could evolve into something akin to a Pod with sidecar containers, or using layering of compose spec if we want to fully realize this vision. We can be quite inspired by how the WSL team is handling shared subsystems such as having wslg on the side but that might be a little broader than just the scope of this simple discussion.

schlich commented 6 months ago

this is all really great stuff. would you mind elaborating on this part more @shikanime ?

While this proposal addresses the integration of caching mechanisms for devcontainer builds, it doesn't encompass solutions for user relative cache directories like local $HOME/.cache/pip directories under user home paths. It primarily solve global caching mechanisms, such as /var/cache.

are absolute paths a firm requirement by docker or something?

shikanime commented 6 months ago

Docker itself does have an absolute path requirement on the target side. Devcontainer spec allows users to be switched within the container, but the timing and mechanism remain unclear. It's uncertain when, where, and how this user creation and switching occurs, but I have my doubt that this is during the runtime lifecycle of the container, therefore after the docker build steps. Also, I believe there are a few places in the spec that allow the use of certain variables, but the implementation details aren't really clear to me, I think there's a dockerfile that's templated behind the scene, so maybe having relative path is not an issue ?

schlich commented 6 months ago

Stray thoughts, may or may not be relevant or helpful.

Following up on your nix-adjacent lines of thought,I know Arch Linux's big thing is their rolling upgrades. I wonder if there's lessons to be applied here for the apt-get update problem.

I think i have a decent idea of the lifecycle timelines. Nothing we can't brute force with some logging anyways.

I wonder how we might utilize the XDG Base Directory specs?

ahjulstad commented 4 months ago

I don't know if it is relevant, but I am using an external volume mount in a docker compose file to share the ~/.julia folder between my devcontainers. As first run-time startup takes time in Julia (due to on-demand native code compilation) this has tremendous benefits.

Now if there was a way to make this work when the external mount is not present (like in a github codespace)....

Perhaps ugly, but useful for me.

services:
  devcontainer:
    image: mcr.microsoft.com/devcontainers/base:bullseye
    volumes:
      - ../..:/workspaces:cached
      - dotjulia:/home/vscode/.julia
    command: sleep infinity

volumes:
  dotjulia:
    external: true 
    name: dotjulia

{
    "dockerComposeFile": "docker-compose.yml",
    "service": "devcontainer",
    "workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
    "name": "Debian",
    "features": {
        "ghcr.io/julialang/devcontainer-features/julia:1": {
            "channel": "release"
        }
    }
}