Open sjud opened 3 years ago
So, I noticed that it kept getting stuck in the cargo chef stages and when I deleted those it built. So right now my Dockerfile is `FROM rust:1.50 AS builder WORKDIR app COPY . . ENV SQLX_OFFLINE true RUN cargo build --release --bin zero2prod
FROM debian:buster-slim AS runtime WORKDIR app RUN apt-get update -y \ && apt-get install -y --no-install-recommends openssl \ && apt-get autoremove -y \ && apt-get clean -y \ && rm -rf /var/lib/apt/lists/* COPY --from=builder /app/target/release/zero2prod zero2prod COPY configuration configuration ENV APP_ENVIRONMENT production ENTRYPOINT ["./zero2prod"]`
And it also built in the same amount of time ~20 minutes that it took me to get the error I was having. I'm not sure exactly what cargo chef was doing so I can't comment further but will leave this open in case its of interest.
My wild guess is that one of the COPY
directives is causing RAM usage to go above the (very constrained) capacity of Digital Ocean's bulider.
This is so annoying, sorry you had to troubleshoot it!
I had successfully deployed zero2prod on DO using cargo-chef on an earlier chapter, but then ran into this issue when I came back to it and caught up. Because I'm not the only one who ran into it, and sjud is on the earlier chapter, I suspect something changed on DO's side. There are a number of issues on the image builder DO uses like this one that imply a version bump made it have trouble with multistage builds with a lot of files (like cargo's target or npm's node_modules).
I want to write down my experience about deploying to digital ocean app platform following #4-deploy-to-digitalocean-apps-platform hoping this will save somebody a whole day.
The post talks about optimizing build image size and the dockerfile have the following structure
FROM lukemathwalker/cargo-chef as planner
...
FROM lukemathwalker/cargo-chef as cacher
...
FROM rust:1.50 AS builder
...
FROM debian:buster-slim AS runtime
The lukemathwalker/cargo-chef
image is based on the rust
image. Specifying rust:1.50
as the base in the builder
stage does not ensure the builder stage will leverage the cache from the cacher
stage, because the version of the rust
image that lukemathwalker/cargo-chef
based on may not be the same as rust:1.50
. And it turns out to recompile the dependencies in the builder
stage in my local machine. What's worst is that cargo generates artifacts for the whole project again, making the builder
image size explode to about 7 GiB in my machine !
As a solution, I explicitly specify the rust
image version and install cargo-chef:
############### Planner stage ###############
FROM rust:1.49 AS planner
WORKDIR /app
RUN cargo install cargo-chef
# Copy all files from our working environment
COPY . .
# Compute a lock-like file for our project
RUN cargo chef prepare --recipe-path recipe.json
############### Cacher stage ###############
FROM rust:1.49 AS cacher
WORKDIR /app
RUN cargo install cargo-chef
COPY --from=planner /app/recipe.json recipe.json
# Build our project dependencies, not our application
RUN cargo chef cook --release --recipe-path recipe.json
############### Builder stage ###############
# We use the latest Rust stable release as base image
FROM rust:1.49 AS builder
WORKDIR /app
# Copy over the cached dependencies
COPY --from=cacher /app/target target
COPY --from=cacher $CARGO_HOME $CARGO_HOME
...
this does solve the problem and reduces the image build time.
With the modified dockerfile, I head off to deploy to digital ocean and it failed:
Build Error: Out of Memory
Your build job failed because it was out of memory.
Error code: BuildJobOutOfMemory
This error message is not very helpful and misleading. The support team told me that the resources for builds is 8gb of combined RAM and disk space. It is more about the disk space than the RAM in this case. Looking at digital ocean's deployment log, I found the lines:
2021-05-30T04:53:10.806979004Z [36mINFO[0m[2338] RUN cargo build --release --bin zero2prod
2021-05-30T04:53:10.807015294Z [36mINFO[0m[2338] Taking snapshot of full filesystem...
2021-05-30T04:53:59.618085679Z [36mINFO[0m[2387] cmd: /bin/sh
2021-05-30T04:53:59.618123452Z [36mINFO[0m[2387] args: [-c cargo build --release --bin zero2prod]
2021-05-30T04:53:59.618303719Z [36mINFO[0m[2387] Running: [/bin/sh -c cargo build --release --bin zero2prod]
2021-05-30T04:54:03.447134582Z Compiling libc v0.2.94
2021-05-30T04:54:03.452872384Z Compiling tokio v1.6.0
2021-05-30T04:54:03.510041285Z Compiling num-traits v0.2.14
... # a lot more lines
It's recompiling the dependencies again !!! But why? Further the team told me that they are using kaniko to build from the dockerfile instead of the usual docker daemon. Anyway, it turns out to not respect the cache.
The final rescue I pick up is to use the container registry:
# spec.yaml
name: zero2prod
region: sgp
services:
- name: zero2prod
image:
registry_type: DOCR
repository: zero2prod
...
If you could tolerate the painful build time in digital ocean, another solution is to use the simple dockerfile avoiding the cache.
After a litte googling, I found cargo-build-deps which utilizes cargo build -p
and does not need a recipe.json
file.
Without needing to generate a bookkeeping file will help make the docker build process simpler, and possibly help out kaniko
.
To give it a try, I update the Dockerfile, and bingo, it works!.
Please note that cargo-build-deps enfores cargo update
before building the dependencies which I think is not the desired behavior, so I just make my own clone of it. See issue
@eeff Thanks for the detailed write-up!
I ran into the same out of memory error and rather than engage in the extensive troubleshooting journey you documented, I gave up on DO and built my own Docker host on a VPS server.
That is a significant project and lacks some of DO's features but offers a lot more power for the money, if we're talking about a full-time production app.
As Luke put it:
"deployments are (still) a messy business."
@eeff Thank you very much. This also helped me figure out why my builds were so much slower than I'd expect: because it wasn't using the cache, and instead recompiles everything in the builder
stage
Unfortunately while this did improve things slightly, it did not resolve the error. I will attempt to use cargo-build-deps
and see if that helps.
EDIT: I went with the easiest solution I could think of, linking my github account to a docker hub account, building the image there, and then using the image
option in the spec.yaml for DigialOcean instead of pulling from github. It does involve an extra component, in having to go via docker hub, but the end result seems to be the same.
I am running into a similar issue in deploying to DO using the latest Dockerfile from the 20210712 version of the book. The building job fails with these logs:
[2021-07-16 20:40:53] INFO[0140] WORKDIR /app [2021-07-16 20:40:53] INFO[0140] cmd: workdir [2021-07-16 20:40:53] INFO[0140] Changed working directory to /app [2021-07-16 20:40:53] INFO[0140] Creating directory /app [2021-07-16 20:40:53] INFO[0140] Taking snapshot of files... [2021-07-16 20:40:53] INFO[0140] COPY --from=cacher /app/target target [2021-07-16 20:40:53] error building image: error building stage: failed to execute command: resolving src: failed to get fileinfo for /kaniko/1/app/target: lstat /kaniko/1/app/target: no such file or directory [2021-07-16 20:40:53] [2021-07-16 20:40:53] command exited with code 1 [2021-07-16 20:40:56] ! Build failed (exit code 1)
Just wanted to mention that I've got the very same problem. I solved it by removing cargo-chef
for now. However would be nice to use layer caching, otherwise build times are crazy :)
I removed cargo-chef to make it work also
FROM rust:1.54.0 AS builder
WORKDIR /app
COPY . .
COPY configuration configuration
ENV SQLX_OFFLINE true
# Build our application, leveraging the cac
RUN cargo install --path .
FROM debian:buster-slim
RUN apt-get update -y && \
apt-get install -y openssl \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /usr/local/cargo/bin/zero2prod /usr/local/bin/
COPY configuration configuration
ENV APP_ENVIRONMENT production
ENTRYPOINT ["/usr/local/bin/zero2prod"]
It took 40 minutes to build and deploy on Digital Ocean
Can you try with this edited Dockerfile, that still includes cargo-chef
but avoids copying over the cached dependencies?
FROM lukemathwalker/cargo-chef:latest-rust-1.53.0 as chef
WORKDIR /app
FROM chef as planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json
FROM chef as builder
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json
COPY . .
ENV SQLX_OFFLINE true
RUN cargo build --release --bin zero2prod
FROM debian:buster-slim AS runtime
WORKDIR /app
RUN apt-get update -y \
&& apt-get install -y --no-install-recommends openssl \
# Clean up
&& apt-get autoremove -y \
&& apt-get clean -y \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/zero2prod zero2prod
COPY configuration configuration
ENV APP_ENVIRONMENT production
ENTRYPOINT ["./zero2prod"]
I managed to try it out this morning and I can confirm it no longer fails due to an out-of-memory error. The revised Dockerfile will be included in the next release.
I'm still having the same issues as described above. Is this expected to be fixed in the latest version of cargo chef?
This is not a cargo-chef issue unfortunately - it's a fundamental limitation of the build machines on DO combined with high resource usage by the Rust compiler. The real solution is going to be ditching DO for Docker builds I am afraid.
In my case where it fails with OOM was when taking a snapshot after cargo build --release
, after testing different solutions what worked for me was to build, copy the binary and then clean the target dir, all in the same command.
RUN cargo build --release --bin zero2prod && \
cp /app/target/release/zero2prod zero2prod && \
cargo clean
So in the last step I copy the file from /app/zero2prod
instead of /app/target/release/zero2prod
, one downside of this is that I ended up removing the chef step, so my build times are not so good, but acceptable for the moment.
@LukeMathWalker - I'd like suggest this issue be reopened. I just ran into this today while finishing up chapter 10, so it still seems to be an issue.
I tried a number of solutions, including bumping the size of my machine and RUN cargo build --release --bin zero2prod && cp ./target/release/zero2prod ./zero2prod && cargo clean
in my docker file, but to no avail so far.
Edit: I've tried bumping the production server up multiple tiers to no avail.
Unfortunately the size of the production server has no influence on the size of the build server 😞
I'm looking into using the docker registry along with github's CI to resolve this (build images on GH not DI). If I get something working, I'll post details here.
Here is the workflow that works for me:
https://gist.github.com/chamons/654f005caf2318db7a0f818a3c33fe2d
You'll obviously need to replace caffeinated-gorilla
registry name the app name zero-2-prod
, and the tag to fit your configuration.
You have to:
I do not have it setup to push every build, as github has usage limits I'm afraid of hitting, but that is possible.
The biggest thing missing is docker image caching. I know there is a cache action, and that should drastically reduce docker build time in theory. I hope to mess with it tonight, but I wanted to share what I found.
This setup is significantly worse that the app builder, and I reached to support to let them know, but it at least works.
Just going through chapter 5 now and hit this - is there any fix our side or are we just waiting for DO to do something?
...in the meantime I've opened a DO support ticket (why not since it's a paid service)
In App Platform, the build memory is shared between files and the processes running to build the application. Builds are limited to 8GiB of total memory. As of now, we cannot increase the memory allocated during the build phase. As with any file system, there is some per-file overhead so sites with lots of small files may count higher. The processes plus the per-file overhead is likely what’s leading to this OOM. We don’t have any immediate solutions on our end for this build error.
Increasing the tier unfortunately wouldn’t be helpful in the build phase. However, there is a workaround that you can give a try. You can consider building via Dockerfile outside of App Platform and leverage DOCR support (or Docker Hub) to deploy the image in the App Platform. You can also achieve the same using GitHub Actions.
TL:DR nothing they can do, recommend building the container elsewhere
We are waiting for DO to do something. You can work around the problem by building the Docker image via GitHub actions and telling DO to use it, as @chamons described.
Update - so I switched my .dockerignore
to use an allowlist pattern just to ensure we're not copying over any unnecessary build context from wherever DO runs the docker build and the build completed fine (+1 additional follow-up build)
See this commit
Probably complete coincidence but thought I'd post here in case it helps anyone else
I could hardly know less about this, but when I got the OOM, I hit Retry in the Activity tab and it succeeded. Maybe something is being drawn from cache on the second attempt?
Update - so I switched my
.dockerignore
to use an allowlist pattern just to ensure we're not copying over any unnecessary build context from wherever DO runs the docker build and the build completed fine (+1 additional follow-up build)See this commit
Probably complete coincidence but thought I'd post here in case it helps anyone else
same fix as you, did the trick here
I ran into this issue recently as well. To be honest, I feel like I tried every approach here, and none of them worked, and then I tried each of them again, and was finally able to push to DigitalOcean and have it succeed with out the OOM error.
Figured I would leave this here in case anyone else was having trouble and this miraculously worked for them, too.
Hello, I am on section 5.4 and I'm trying to get the app running on DigitalOcean. I am having trouble getting it to run after compiling, it compiles on DigitalOcean and then stalls for a while before issuing this error.
Build Error: Out of Memory
Your build job failed because it was out of memory. Error code: BuildJobOutOfMemory
I went up to 4GB of RAM to see if that would change the result and it had no effect. Most searches suggest increasing memory but I imagine that other people have been able to run the app with less. Here's the last of the log:
zero2prod | 18:47:21 INFO[1231] Changed working directory to /app
zero2prod | 18:47:21 INFO[1231] Creating directory /app
zero2prod | 18:47:21 INFO[1231] Taking snapshot of files...
zero2prod | 18:47:21 INFO[1231] COPY . .
zero2prod | 18:47:21 INFO[1231] Taking snapshot of files...
zero2prod | 18:47:21 INFO[1231] COPY --from=cacher /app/target target
zero2prod | 18:47:47 INFO[1257] Taking snapshot of files...
zero2prod | 18:48:33 INFO[1303] COPY --from=cacher /usr/local/cargo /usr/local/cargo zero2prod | 18:49:22 INFO[1352] Taking snapshot of files...
The web service doesn't expose a console, at least not during this stage, so I'm not sure how to debug the problem further, any advice would be appreciated. Thank you. :)