LukeMathWalker / cargo-chef

A cargo-subcommand to speed up Rust Docker builds using Docker layer caching.
Apache License 2.0
1.81k stars 116 forks source link

cargo chef prepare failed using cargo-chef:0.1.56 #205

Closed sazzer closed 1 year ago

sazzer commented 1 year ago

When using lukemathwalker/cargo-chef:0.1.56-rust-slim-bullseye (or, more specifically, lukemathwalker/cargo-chef:latest-rust-slim-bullseye), I'm now getting the following error:

 > [planner 2/2] RUN cargo chef prepare --recipe-path recipe.json:
#17 0.181 Error: Failed to compute recipe
#17 0.181
#17 0.181 Caused by:
#17 0.181     0: Cannot extract Cargo metadata
#17 0.181     1: `cargo metadata` exited with an error: error: failed to parse manifest at `/nexalon/Cargo.toml`
#17 0.181
#17 0.181        Caused by:
#17 0.181          can't find library `nexalon_lib`, rename file to `src/lib.rs` or specify lib.path
#17 0.181

If I change back to using lukemathwalker/cargo-chef:0.1.55-rust-slim-bullseye then this all works fine.

My Dockerfile is as follows:

FROM lukemathwalker/cargo-chef:0.1.56-rust-slim-bullseye AS chef
WORKDIR /nexalon
RUN rustc --version

FROM chef AS modules
COPY . .
RUN find . -name src | xargs rm -rf

FROM chef AS planner
COPY --from=modules /nexalon .
RUN cargo chef prepare --recipe-path recipe.json

FROM chef AS builder
RUN apt update && apt install --yes ca-certificates openssl libssl-dev pkg-config

COPY --from=planner /nexalon/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json

RUN find . -name src | xargs rm -rf
COPY . .

RUN cargo build --release

FROM debian:bullseye-slim AS runtime
WORKDIR /nexalon
RUN apt update
RUN apt install --yes ca-certificates

COPY --from=builder /nexalon/target/release/nexalon .
CMD "/nexalon/nexalon"

ENV RUST_LOG=info

And my Cargo.toml file is:

[package]
name = "nexalon"
version = "0.1.0"
edition = "2021"

[[bin]]
name = "nexalon"

[lib]
name = "nexalon_lib"

[dependencies]
async-trait = "0.1.68"
axum = "0.6.15"
axum-macros = "0.3.7"
config = "0.13.3"
dotenv = "0.15.0"
env_logger = "0.10.0"
http = "0.2.9"
http-body = "0.4.5"
http_halforms = { version = "0.1.0", features = ["axum"] }
hyper = { version = "0.14.26", features = ["full"] }
problemdetails = { version = "0.2.1", features = ["axum"] }
serde = { version = "1.0.160", features = ["derive"] }
serde_json = { version = "1.0.96", features = ["preserve_order"] }
thiserror = "1.0.40"
time = { version = "0.3.20", features = ["formatting", "parsing", "macros", "serde-well-known"] }
tokio = { version = "1.27.0", features = ["full"] }
tower = { version = "0.4.13", features = ["full"] }
tower-http = { version = "0.4.0", features = ["full"] }
tracing = { version = "0.1.37", features = ["log", "async-await"] }
tracing-futures = { version = "0.2.5", features = ["tokio"] }
tracing-log = "0.1.3"
tracing-subscriber = { version = "0.3.16", features = ["time", "json", "env-filter"] }

[dev-dependencies]
assert2 = "0.3.10"
insta = { version = "1.29.0", features = ["json"] }
test-case = "3.1.0"
test-log = { version = "0.2.11", features = ["trace"] }

[profile.dev]
split-debuginfo = "unpacked"

Cheers

LukeMathWalker commented 1 year ago

Thanks for the report!

What happens if you run cargo metadata from the root folder of your project (from your machine, without touching Docker)?

~This is a regression from #204~ This was triggered by #204, but curious to understand better what is happening cc @Kobzol

LukeMathWalker commented 1 year ago

This might explain the issue:

FROM chef AS modules
COPY . .
RUN find . -name src | xargs rm -rf

What is the purpose of doing this deletion? Can you try removing this intermediate layer?

jyn514 commented 1 year ago

Here is a more realistic place where running cargo metadata might fail: I had a Dockerfile that looked something like this:

RUN cargo install cargo-chef

FROM chef as prepare
COPY . /usr/app
RUN cargo chef prepare --recipe-path recipe.json

FROM chef as builder
RUN git config --global credential.helper store
COPY .git-credentials /root/.git-credentials
COPY --from=prepare /usr/app/recipe.json .
RUN cargo chef cook --workspace --recipe-path recipe.json

In 1.55 this worked fine. In 1.56 it fails because cargo needs the credentials to access a private registry.

Fixing this was easy (just copy the credentials in earlier in the original image), but diagnosing the error was very difficult. Maybe 0.1.56 should be yanked and republished as 0.2 instead?

sazzer commented 1 year ago

I've got a pattern where I've got a main crate in the root directory, and then supporting crates in the crates directory. Cargo Chef needs to have the Cargo.toml and Cargo.lock files from all of these crates added, but not the source files. So rather than having to change the Dockerfile whenever a new supporting crate is added - which I invariably forget, break the build, and then fix - or else copying everything, meaning that a source change invalidates the Cargo Chef cache layers, I put that together so that it would create an intermediate layer containing only the Cargo.toml and Cargo.lock files from all of the crates, and then can copy only those across to the next layer.

Changing the Dockerfile to copy only the Cargo.toml and Cargo.lock files without the use of the intermediate layer doesn't make any difference. However, changing it to copy all of the source files as well does fix things. Curiously, this also doesn't cause it to need to rebuild every layer when a source file changes, which means that either it was never needed in the first place (wouldn't surprise me!) or else something has been fixed since I first started doing that...

So this Dockerfile seems to now work fine:

FROM lukemathwalker/cargo-chef:latest-rust-slim-bullseye AS chef
WORKDIR /nexalon
RUN rustc --version

FROM chef AS planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json

FROM chef AS builder
RUN apt update && apt install --yes ca-certificates openssl libssl-dev pkg-config

COPY --from=planner /nexalon/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json

RUN find . -name src | xargs rm -rf
COPY . .

RUN cargo build --release

FROM debian:bullseye-slim AS runtime
WORKDIR /nexalon
RUN apt update
RUN apt install --yes ca-certificates

COPY --from=builder /nexalon/target/release/nexalon .
CMD "/nexalon/nexalon"

ENV RUST_LOG=info

Cheers

mirek26 commented 1 year ago

πŸ‘‹πŸ½ 1.56 seems to have broken our CI jobs for the same reason - we only copy over Cargo.toml/Cargo.lock, which was sufficient before and it's no longer enough. The benefit of this is that we don't need to re-run prepare if dependencies didn't change, which guarantees that we don't have to re-run cook.

If prepare is guaranteed to produce the same recipe with the same deps, I guess this doesn't give us much. But I had some previous issues with everything being rebuilt more often than expected, and restricting the copy to Cargo.toml/Cargo.lock seems to have helped

Kobzol commented 1 year ago

cargo chef should serve as a solution to avoid the "Copy only Cargo.xxx" pattern. Now that it parses the project using cargo metadata, it will require the whole project to be present (as was intended originally).

If it rebuilds too often, that might be a bug in cargo chef. Hopefully with using cargo metadata we will be now also be able to reduce these kinds of unnecessary rebuilds.

LukeMathWalker commented 1 year ago

If prepare is guaranteed to produce the same recipe with the same deps, I guess this doesn't give us much. But I had some previous issues with everything being rebuilt more often than expected, and restricting the copy to Cargo.toml/Cargo.lock seems to have helped

@mirek26: This is precisely what prepare is supposed to doβ€”you throw the entire project at it and prepare will extract the relevant info that will remain unchanged as long as your dependencies remain unchanged. If you notice any scenario where this doesn't happen, report it as a bug and I'll be more than happy to look at it.

However, changing it to copy all of the source files as well does fix things. Curiously, this also doesn't cause it to need to rebuild every layer when a source file changes, which means that either it was never needed in the first place (wouldn't surprise me!) or else something has been fixed since I first started doing that...

@sazzer: it's possible that we had some bugs in previous versions that were causing unnecessary rebuilds, but "copy the whole project" has always been the recommended pattern (e.g. see the examples in the README). If you run into any of those unnecessary rebuilds, please file a bug report!

LukeMathWalker commented 1 year ago

Fixing this was easy (just copy the credentials in earlier in the original image), but diagnosing the error was very difficult. Maybe 0.1.56 should be yanked and republished as 0.2 instead?

@jyn514: I'm inclined not to yankβ€”this update is surfacing a lot of Hyrum law pain that went unnoticed in previous versions because we were not talking full advantage of the assumptions that were stated in the README (i.e. prepare takes the whole project as input). I'll keep this issue open as a reference point in case other folks bump into issues and we'll work through what comes πŸ‘πŸ» Unless an actual defect pops through, I'm not going to yank.

arusahni commented 1 year ago

With the most recent release (0.1.56) my builds have started failing due to cargo-chef not being able to find a vendored dependency.

I have the following worktree:

β”œ mycrate/
β”‚ β”” Cargo.toml
β”œ vendor/
β”‚ β”” third_party_project/
β”‚   β”œ src/
β”‚   β”‚ β”” vendored_crate/
β”‚   β”‚   β”” Cargo.toml
β”‚   β”” Cargo.toml
β”” Cargo.toml

./Cargo.toml

[workspace]
members = [
    "mycrate",
]
exclude = ["vendor"]

./mycrate/Cargo.toml

[package]
name = "mycrate"
version = "0.1.0"
edition = "2021"

[dependencies]
vendored_crate = { path = "../vendor/third_party_project/src/vendored_crate" }

./vendor/third_party_project/Cargo.toml

[workspace]
members = [
    "src/vendored_crate",
]

resolver = "2"

./vendor/third_party_project/src/vendored_crate/Cargo.toml

[package]
name = "vendored_crate"
version = "0.0.0"
edition.workspace = true
rust-version.workspace = true

Previously, RUN cargo chef cook --recipe-path recipe.json would cook the recipe. However, now it fails with

#0 4.587 error: failed to get `vendored_crate` as a dependency of package `mycrate v0.0.1 (/code/mycrate)`
#0 4.587
#0 4.587 Caused by:
#0 4.587   failed to load source for dependency `vendored_crate`
#0 4.587
#0 4.587 Caused by:
#0 4.587   Unable to update /code/vendor/third_party_project/src/vendored_crate
#0 4.587
#0 4.587 Caused by:
#0 4.587   failed to read `/code/vendor/third_party_project/src/vendored_crate/Cargo.toml`
#0 4.587
#0 4.587 Caused by:
#0 4.587   No such file or directory (os error 2)
#0 4.589 thread 'main' panicked at 'Exited with status code: 101', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/cargo-chef-0.1.56/src/recipe.rs:189:27
LukeMathWalker commented 1 year ago

Can you share your Dockerfile @arusahni?

Zizico2 commented 1 year ago

I got cargo chef failed to load manifest ... with this Dockerfile:

FROM rust:latest AS chef
USER root
RUN cargo install cargo-chef --locked --version 0.1.56
RUN apt-get update
RUN apt-get install protobuf-compiler libprotobuf-dev -y
RUN rustup update stable
WORKDIR /app

FROM chef AS planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json --bin thisdb-crud

FROM chef AS builder
COPY --from=planner /app/recipe.json recipe.json
ARG build_profile=dev
RUN --mount=type=ssh cargo chef cook --profile=${build_profile} --recipe-path recipe.json --bin thisdb-crud
COPY . .
RUN cargo build --profile=${build_profile} --bin thisdb-crud

FROM debian:stable AS runtime
RUN apt-get update
ARG output_directory=debug
COPY --from=builder /app/target/${output_directory}/thisdb-crud /usr/local/bin/
CMD ["/usr/local/bin/thisdb-crud"]

The manifest it fails to find is for a dependency in the same workspace.

Reverting to 0.1.54 fixes it

LukeMathWalker commented 1 year ago

OK, that's a regression @Zizico2, thanks for reporting it! I have a fix ready in #207, just waiting for CI to do its thing and then I'll push it out.

EDIT: the fix has been released as 0.1.57.

arusahni commented 1 year ago

@LukeMathWalker here is my Dockerfile

FROM rust:1.68.2-slim-buster as builder

ENV CARGO_INCREMENTAL=0
ARG CARGO_RELEASE=--release
ARG CARGO_FEATURES=--no-default-features

RUN cargo install cargo-chef

FROM builder AS rust-chef-planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json

FROM builder AS rust-builder
COPY --from=rust-chef-planner /workdir/recipe.json recipe.json
RUN cargo chef cook $CARGO_FEATURES $CARGO_RELEASE --recipe-path recipe.json
COPY . .
RUN cargo build $CARGO_FEATURES $CARGO_RELEASE

FROM debian:bullseye-20230411-slim as base
COPY --from=builder /usr/local/bin/ /usr/local/bin/

FROM base AS daemons
COPY --from=rust-builder /workdir/target/*/myapp /usr/local/bin/
LukeMathWalker commented 1 year ago

To fix it you need to copy in your vendor folder:

# [...]
FROM builder AS rust-builder
COPY --from=rust-chef-planner /workdir/recipe.json recipe.json
# The new line!
COPY vendor/ vendor/
RUN cargo chef cook $CARGO_FEATURES $CARGO_RELEASE --recipe-path recipe.json

That should not be an issue for caching since the vendored folder will be unchanged if your dependencies are unchanged @arusahni I'm actually surprised that it was working before.

arusahni commented 1 year ago

@LukeMathWalker that worked :+1: . Thanks!

neoeinstein commented 1 year ago

We have a private Git dependency in our tree, and the most recent versions are ending up with errors because cargo metadata can't seem to access the repository. All cargo commands succeed, except for the cargo metadata command run from within cargo chef, which is now failing. We use net.git-fetch-with-cli = true in our base image configuration to fetch these dependencies.

Downgrading back to 0.1.52 makes things work properly again, so that's what we're doing for now.

Error Log and Dockerfile extract

``` > [planner 2/2] RUN cargo chef prepare --recipe-path recipe.json: #15 1.366 Error: Failed to compute recipe #15 1.366 #15 1.366 Caused by: #15 1.366 0: Cannot extract Cargo metadata #15 1.366 1: `cargo metadata` exited with an error: Updating crates.io index #15 1.366 Updating git repository `https://github.com/company/repo` #15 1.366 Warning: Permanently added the ECDSA host key for IP address '140.82.113.4' to the list of known hosts. #15 1.366 git@github.com: Permission denied (publickey). #15 1.366 fatal: Could not read from remote repository. #15 1.366 #15 1.366 Please make sure you have the correct access rights #15 1.366 and the repository exists. #15 1.366 error: failed to get `crate` as a dependency of package `application v0.1.0 (/app)` #15 1.366 #15 1.366 Caused by: #15 1.366 failed to load source for dependency `crate` #15 1.366 #15 1.366 Caused by: #15 1.366 Unable to update https://github.com/company/repo#27c082a4 #15 1.366 #15 1.366 Caused by: #15 1.366 failed to clone into: /usr/local/cargo/git/db/repo-0a285323a4bfb8d5 #15 1.366 #15 1.366 Caused by: #15 1.366 process didn't exit successfully: `git fetch --force --update-head-ok 'https://github.com/company/repo' '+HEAD:refs/remotes/origin/HEAD'` (exit status: 128) #15 1.366 ------ executor failed running [/bin/sh -c cargo chef prepare --recipe-path recipe.json]: exit code: 1 ``` ```Dockerfile # syntax=docker/dockerfile:1 FROM 1234556234.dkr.ecr.us-east-1.amazonaws.com/rust-base:1 as chef FROM chef as planner COPY . . RUN cargo chef prepare --recipe-path recipe.json # <-- Error here FROM chef as builder COPY --from=planner /app/recipe.json recipe.json RUN --mount=type=ssh cargo chef cook --recipe-path recipe.json COPY . . RUN --mount=type=ssh cargo build --locked ```

After reviewing, it appears that we probably can also fix this for ourselves by mounting the SSH key for the cargo chef prepare step.

LukeMathWalker commented 1 year ago

Mounting the key for cargo chef prepare should definitely fix the issue.

We could also run cargo metadata --no-deps instead of cargo metadata (as suggested in #211), which should be enough for what we are currently doing. I don't know if that blocks some of the work you were doing on target detection @Kobzol? I'd be tempted to say "no", but asking to make sure.

LukeMathWalker commented 1 year ago

I've merged #211 after reviewing that it won't cause any issues for the time beingβ€”this should allow your Dockerfiles to build unchanged @neoeinstein.

It's going out as 0.1.59, the new batch of Docker images is currently building.

LukeMathWalker commented 1 year ago

It's been a month and no new issues have been reported, so I'll go ahead and close this. Thanks for the help folks!