LukeMathWalker / cargo-chef

A cargo-subcommand to speed up Rust Docker builds using Docker layer caching.
Apache License 2.0
1.72k stars 113 forks source link

Add crate index caching guidance #274

Closed alnoki closed 1 month ago

alnoki commented 1 month ago

Crate index caching

This PR adds documentation on how to cache a local crate index when working with workspaces that have large git dependencies. I devised this method after I noticed that the cargo chef cook step was cloning a non-target git dependency (namely, aptos-core) during a cargo-chef build, since compilation requires a complete local crate index.

The documentation in the PR goes into detail about the mechanisms at play, and below I'm including an example for additional illustrative purposes.

Related:

Example

Layout

Consider the following workspace:

├── Cargo.toml
├── Dockerfile
├── my_package
│   ├── Cargo.toml
│   └── my_bin.rs
└── another_package
    ├── Cargo.toml
    └── another_bin.rs

The top-level Cargo.toml file defines two packages:

[workspace]
members = [
  "my_package",
  "another_package"
]
resolver = "2"

[workspace.package]
edition = "2021"
rust-version = "1.79.0"

The Dockerfile is identical to the template proposed in this PR:

FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef
WORKDIR /app

FROM chef AS planner
ARG BIN
COPY . .
# Prepare recipe one directory up to simplify local crate index caching.
RUN cargo chef prepare --bin "$BIN" --recipe-path ../recipe.json
# Delete everything not required to build complete local crate index, to avoid
# invalidating local crate index cache on code changes or recipe updates.
RUN find -type f \! \( -name 'Cargo.toml' -o -name 'Cargo.lock' \) -delete && \
    find -type d -empty -delete

# Invoke a dry run lockfile update against the manifest skeleton, thereby
# caching a complete local crate index.
FROM chef AS indexer
COPY --from=planner /app .
RUN cargo update --dry-run

FROM chef AS builder
ARG BIN PACKAGE
COPY --from=planner /recipe.json recipe.json
# Copy cached crate index.
COPY --from=indexer $CARGO_HOME $CARGO_HOME
# Build in locked mode to prevent local crate index cache invalidation, thereby
# downloading only the necessary dependencies for the binary.
RUN cargo chef cook --bin "$BIN" --locked --package "$PACKAGE" --release
COPY . .
# Build offline solely from cached crate index and downloaded dependencies.
RUN cargo build --bin "$BIN" --frozen --package "$PACKAGE" --release
# Rename executable for ease of copying.
RUN mv "/app/target/release/$BIN" /app/executable;

FROM debian:bookworm-slim AS runtime
COPY --from=builder /app/executable /usr/local/bin
ENTRYPOINT ["/usr/local/bin/executable"]

The Cargo.toml for my_package has no special dependencies:

[[bin]]
name = "my-bin"
path = "my_bin.rs"

[package]
edition = "2021"
name = "my_package"
version = "1.0.0"

And my_bin.rs declares a simple "Hello, world!" statement:

fn main() {
    println!("Hello, world!")
}

However, the Cargo.toml for another_package has a git dependency on aptos-core (note that per aptos-core #8984 there is no plan to support package management on crates.io):

[[bin]]
name = "another-bin"
path = "another_bin.rs"

[dependencies.move-core-types]
git = "https://github.com/aptos-labs/aptos-core"
tag = "aptos-node-v1.15.2"

[package]
edition = "2021"
name = "another_package"
version = "1.0.0"

Note that another_bin.rs has a modified "Hello, world!" statement, which relies on a random account address generated via the move-core-types dependency:

use move_core_types::account_address::AccountAddress;

fn main() {
    println!("Hello, {}!", AccountAddress::random());
}

Cache hit dynamics

To follow along, replicate the above workspace. Then generate a lockfile:

cargo check

To build and run my-bin via cargo-chef:

docker build \
    --build-arg="BIN=my-bin" \
    --build-arg="PACKAGE=my_package" \
    --tag my-bin \
    .
docker run my-bin
Hello, world!

Note that this downloads the entire aptos-core repository during the --dry-run step, since a local crate index is required for the eventual cargo chef cook operation:

 => [indexer 2/2] RUN cargo update --dry-run

However, if my_bin.rs is modified to instead print Hello, chef!, since the aptos-core git dependency crate index is already cached, the repository does not need to be downloaded again when re-building the image.

To run another-bin:

docker build \
    --build-arg="BIN=another-bin" \
    --build-arg="PACKAGE=another_package" \
    --tag another-bin \
    .
docker run another-bin
Hello, 0xa53c237d4f6fd71c6355254a36ecaa8fed0269430669131d21a27c732d66b18e!

Here, the local image cache preserves the output for the --dry-run crate index generation step, since the Cargo.toml manifest skeleton is common across both builds in the workspace.

Moreover, updating another_bin.rs to print Goodbye, ... results in another cache hit since there are no new dependencies.

Cache miss dynamics

The local crate index cache step can be undone by simply commenting out the following line in the Dockerfile:

COPY --from=indexer $CARGO_HOME $CARGO_HOME

In this case, the cargo chef cook command has no access to a local crate index cache, and it will need to regenerate it whenever a recipe changes. Notably, this involves re-downloading aptos-core even for changes to my_package that have nothing to do with the dependency.

alnoki commented 1 month ago

I am closing this because I realized that the operations stipulated therein are effectively already taken care of by cargo chef cook.