LukeMathWalker / zero-to-production

Code for "Zero To Production In Rust", a book on API development using Rust.
https://www.zero2prod.com
Apache License 2.0
5.82k stars 508 forks source link

5.4 DigitalOcean Build error out of memory #71

Open sjud opened 3 years ago

sjud commented 3 years ago

Hello, I am on section 5.4 and I'm trying to get the app running on DigitalOcean. I am having trouble getting it to run after compiling, it compiles on DigitalOcean and then stalls for a while before issuing this error.

Build Error: Out of Memory

Your build job failed because it was out of memory. Error code: BuildJobOutOfMemory

I went up to 4GB of RAM to see if that would change the result and it had no effect. Most searches suggest increasing memory but I imagine that other people have been able to run the app with less. Here's the last of the log:

zero2prod | 18:47:21 INFO[1231] Changed working directory to /app
zero2prod | 18:47:21 INFO[1231] Creating directory /app
zero2prod | 18:47:21 INFO[1231] Taking snapshot of files...
zero2prod | 18:47:21 INFO[1231] COPY . .
zero2prod | 18:47:21 INFO[1231] Taking snapshot of files...
zero2prod | 18:47:21 INFO[1231] COPY --from=cacher /app/target target
zero2prod | 18:47:47 INFO[1257] Taking snapshot of files...
zero2prod | 18:48:33 INFO[1303] COPY --from=cacher /usr/local/cargo /usr/local/cargo zero2prod | 18:49:22 INFO[1352] Taking snapshot of files...

The web service doesn't expose a console, at least not during this stage, so I'm not sure how to debug the problem further, any advice would be appreciated. Thank you. :)

sjud commented 3 years ago

So, I noticed that it kept getting stuck in the cargo chef stages and when I deleted those it built. So right now my Dockerfile is `FROM rust:1.50 AS builder WORKDIR app COPY . . ENV SQLX_OFFLINE true RUN cargo build --release --bin zero2prod

FROM debian:buster-slim AS runtime WORKDIR app RUN apt-get update -y \ && apt-get install -y --no-install-recommends openssl \ && apt-get autoremove -y \ && apt-get clean -y \ && rm -rf /var/lib/apt/lists/* COPY --from=builder /app/target/release/zero2prod zero2prod COPY configuration configuration ENV APP_ENVIRONMENT production ENTRYPOINT ["./zero2prod"]`

And it also built in the same amount of time ~20 minutes that it took me to get the error I was having. I'm not sure exactly what cargo chef was doing so I can't comment further but will leave this open in case its of interest.

LukeMathWalker commented 3 years ago

My wild guess is that one of the COPY directives is causing RAM usage to go above the (very constrained) capacity of Digital Ocean's bulider. This is so annoying, sorry you had to troubleshoot it!

Giesch commented 3 years ago

I had successfully deployed zero2prod on DO using cargo-chef on an earlier chapter, but then ran into this issue when I came back to it and caught up. Because I'm not the only one who ran into it, and sjud is on the earlier chapter, I suspect something changed on DO's side. There are a number of issues on the image builder DO uses like this one that imply a version bump made it have trouble with multistage builds with a lot of files (like cargo's target or npm's node_modules).

eeff commented 3 years ago

I want to write down my experience about deploying to digital ocean app platform following #4-deploy-to-digitalocean-apps-platform hoping this will save somebody a whole day.

About leveraging docker's caching capabitlity

The post talks about optimizing build image size and the dockerfile have the following structure

FROM lukemathwalker/cargo-chef as planner
...
FROM lukemathwalker/cargo-chef as cacher
...
FROM rust:1.50 AS builder
...
FROM debian:buster-slim AS runtime

The lukemathwalker/cargo-chef image is based on the rust image. Specifying rust:1.50 as the base in the builder stage does not ensure the builder stage will leverage the cache from the cacher stage, because the version of the rust image that lukemathwalker/cargo-chef based on may not be the same as rust:1.50. And it turns out to recompile the dependencies in the builder stage in my local machine. What's worst is that cargo generates artifacts for the whole project again, making the builder image size explode to about 7 GiB in my machine !

As a solution, I explicitly specify the rust image version and install cargo-chef:

############### Planner stage ###############
FROM rust:1.49 AS planner

WORKDIR /app

RUN cargo install cargo-chef

# Copy all files from our working environment
COPY . .

# Compute a lock-like file for our project
RUN cargo chef prepare --recipe-path recipe.json

############### Cacher stage ###############
FROM rust:1.49 AS cacher

WORKDIR /app

RUN cargo install cargo-chef

COPY --from=planner /app/recipe.json recipe.json

# Build our project dependencies, not our application
RUN cargo chef cook --release --recipe-path recipe.json

############### Builder stage ###############

# We use the latest Rust stable release as base image
FROM rust:1.49 AS builder

WORKDIR /app

# Copy over the cached dependencies
COPY --from=cacher /app/target target
COPY --from=cacher $CARGO_HOME $CARGO_HOME
...

this does solve the problem and reduces the image build time.

About deploying to digital ocean

With the modified dockerfile, I head off to deploy to digital ocean and it failed:

Build Error: Out of Memory

Your build job failed because it was out of memory.
Error code: BuildJobOutOfMemory

This error message is not very helpful and misleading. The support team told me that the resources for builds is 8gb of combined RAM and disk space. It is more about the disk space than the RAM in this case. Looking at digital ocean's deployment log, I found the lines:

2021-05-30T04:53:10.806979004Z INFO[2338] RUN cargo build --release --bin zero2prod
2021-05-30T04:53:10.807015294Z INFO[2338] Taking snapshot of full filesystem...
2021-05-30T04:53:59.618085679Z INFO[2387] cmd: /bin/sh
2021-05-30T04:53:59.618123452Z INFO[2387] args: [-c cargo build --release --bin zero2prod]
2021-05-30T04:53:59.618303719Z INFO[2387] Running: [/bin/sh -c cargo build --release --bin zero2prod]
2021-05-30T04:54:03.447134582Z Compiling libc v0.2.94
2021-05-30T04:54:03.452872384Z Compiling tokio v1.6.0
2021-05-30T04:54:03.510041285Z Compiling num-traits v0.2.14
... # a lot more lines

It's recompiling the dependencies again !!! But why? Further the team told me that they are using kaniko to build from the dockerfile instead of the usual docker daemon. Anyway, it turns out to not respect the cache.

The final rescue I pick up is to use the container registry:

# spec.yaml
name: zero2prod
region: sgp
services:
  - name: zero2prod
    image:
      registry_type: DOCR
      repository: zero2prod
...

If you could tolerate the painful build time in digital ocean, another solution is to use the simple dockerfile avoiding the cache.

Updates

Replace cargo-chef with cargo-build-deps (worked solution)

After a litte googling, I found cargo-build-deps which utilizes cargo build -p and does not need a recipe.json file. Without needing to generate a bookkeeping file will help make the docker build process simpler, and possibly help out kaniko. To give it a try, I update the Dockerfile, and bingo, it works!.

Please note that cargo-build-deps enfores cargo update before building the dependencies which I think is not the desired behavior, so I just make my own clone of it. See issue

gihrig commented 3 years ago

@eeff Thanks for the detailed write-up!

I ran into the same out of memory error and rather than engage in the extensive troubleshooting journey you documented, I gave up on DO and built my own Docker host on a VPS server.

That is a significant project and lacks some of DO's features but offers a lot more power for the money, if we're talking about a full-time production app.

As Luke put it:

"deployments are (still) a messy business."

frjonsen commented 3 years ago

@eeff Thank you very much. This also helped me figure out why my builds were so much slower than I'd expect: because it wasn't using the cache, and instead recompiles everything in the builder stage

Unfortunately while this did improve things slightly, it did not resolve the error. I will attempt to use cargo-build-deps and see if that helps.

EDIT: I went with the easiest solution I could think of, linking my github account to a docker hub account, building the image there, and then using the image option in the spec.yaml for DigialOcean instead of pulling from github. It does involve an extra component, in having to go via docker hub, but the end result seems to be the same.

sr-fuentes commented 3 years ago

I am running into a similar issue in deploying to DO using the latest Dockerfile from the 20210712 version of the book. The building job fails with these logs:

[2021-07-16 20:40:53] INFO[0140] WORKDIR /app [2021-07-16 20:40:53] INFO[0140] cmd: workdir [2021-07-16 20:40:53] INFO[0140] Changed working directory to /app [2021-07-16 20:40:53] INFO[0140] Creating directory /app [2021-07-16 20:40:53] INFO[0140] Taking snapshot of files... [2021-07-16 20:40:53] INFO[0140] COPY --from=cacher /app/target target [2021-07-16 20:40:53] error building image: error building stage: failed to execute command: resolving src: failed to get fileinfo for /kaniko/1/app/target: lstat /kaniko/1/app/target: no such file or directory [2021-07-16 20:40:53] [2021-07-16 20:40:53] command exited with code 1 [2021-07-16 20:40:56] ! Build failed (exit code 1)

gyzerok commented 3 years ago

Just wanted to mention that I've got the very same problem. I solved it by removing cargo-chef for now. However would be nice to use layer caching, otherwise build times are crazy :)

aboseley commented 3 years ago

I removed cargo-chef to make it work also

FROM rust:1.54.0 AS builder
WORKDIR /app
COPY . .
COPY configuration configuration
ENV SQLX_OFFLINE true
# Build our application, leveraging the cac
RUN cargo install --path .

FROM debian:buster-slim
RUN apt-get update -y && \
    apt-get install -y openssl \
    && rm -rf /var/lib/apt/lists/*
COPY --from=builder /usr/local/cargo/bin/zero2prod /usr/local/bin/
COPY configuration configuration
ENV APP_ENVIRONMENT production
ENTRYPOINT ["/usr/local/bin/zero2prod"]

It took 40 minutes to build and deploy on Digital Ocean

LukeMathWalker commented 3 years ago

Can you try with this edited Dockerfile, that still includes cargo-chef but avoids copying over the cached dependencies?

FROM lukemathwalker/cargo-chef:latest-rust-1.53.0 as chef
WORKDIR /app

FROM chef as planner
COPY . .
RUN cargo chef prepare  --recipe-path recipe.json

FROM chef as builder
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json
COPY . .
ENV SQLX_OFFLINE true
RUN cargo build --release --bin zero2prod

FROM debian:buster-slim AS runtime
WORKDIR /app
RUN apt-get update -y \
    && apt-get install -y --no-install-recommends openssl \
    # Clean up
    && apt-get autoremove -y \
    && apt-get clean -y \
    && rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/zero2prod zero2prod
COPY configuration configuration
ENV APP_ENVIRONMENT production
ENTRYPOINT ["./zero2prod"]
LukeMathWalker commented 3 years ago

I managed to try it out this morning and I can confirm it no longer fails due to an out-of-memory error. The revised Dockerfile will be included in the next release.

boyswan commented 2 years ago

I'm still having the same issues as described above. Is this expected to be fixed in the latest version of cargo chef?

LukeMathWalker commented 2 years ago

This is not a cargo-chef issue unfortunately - it's a fundamental limitation of the build machines on DO combined with high resource usage by the Rust compiler. The real solution is going to be ditching DO for Docker builds I am afraid.

norman784 commented 2 years ago

In my case where it fails with OOM was when taking a snapshot after cargo build --release, after testing different solutions what worked for me was to build, copy the binary and then clean the target dir, all in the same command.

RUN cargo build --release --bin zero2prod && \
          cp /app/target/release/zero2prod zero2prod && \
          cargo clean

So in the last step I copy the file from /app/zero2prod instead of /app/target/release/zero2prod, one downside of this is that I ended up removing the chef step, so my build times are not so good, but acceptable for the moment.

chamons commented 2 years ago

@LukeMathWalker - I'd like suggest this issue be reopened. I just ran into this today while finishing up chapter 10, so it still seems to be an issue.

I tried a number of solutions, including bumping the size of my machine and RUN cargo build --release --bin zero2prod && cp ./target/release/zero2prod ./zero2prod && cargo clean in my docker file, but to no avail so far.

Edit: I've tried bumping the production server up multiple tiers to no avail.

LukeMathWalker commented 2 years ago

Unfortunately the size of the production server has no influence on the size of the build server 😞

chamons commented 2 years ago

I'm looking into using the docker registry along with github's CI to resolve this (build images on GH not DI). If I get something working, I'll post details here.

chamons commented 2 years ago

Here is the workflow that works for me:

https://gist.github.com/chamons/654f005caf2318db7a0f818a3c33fe2d

You'll obviously need to replace caffeinated-gorilla registry name the app name zero-2-prod, and the tag to fit your configuration.

You have to:

I do not have it setup to push every build, as github has usage limits I'm afraid of hitting, but that is possible.

The biggest thing missing is docker image caching. I know there is a cache action, and that should drastically reduce docker build time in theory. I hope to mess with it tonight, but I wanted to share what I found.

This setup is significantly worse that the app builder, and I reached to support to let them know, but it at least works.

JonShort commented 2 years ago

Just going through chapter 5 now and hit this - is there any fix our side or are we just waiting for DO to do something?

...in the meantime I've opened a DO support ticket (why not since it's a paid service)


Edit - expand for the response from DO

In App Platform, the build memory is shared between files and the processes running to build the application. Builds are limited to 8GiB of total memory. As of now, we cannot increase the memory allocated during the build phase. As with any file system, there is some per-file overhead so sites with lots of small files may count higher. The processes plus the per-file overhead is likely what’s leading to this OOM. We don’t have any immediate solutions on our end for this build error.

Increasing the tier unfortunately wouldn’t be helpful in the build phase. However, there is a workaround that you can give a try. You can consider building via Dockerfile outside of App Platform and leverage DOCR support (or Docker Hub) to deploy the image in the App Platform. You can also achieve the same using GitHub Actions.

TL:DR nothing they can do, recommend building the container elsewhere

LukeMathWalker commented 2 years ago

We are waiting for DO to do something. You can work around the problem by building the Docker image via GitHub actions and telling DO to use it, as @chamons described.

JonShort commented 2 years ago

Update - so I switched my .dockerignore to use an allowlist pattern just to ensure we're not copying over any unnecessary build context from wherever DO runs the docker build and the build completed fine (+1 additional follow-up build)

See this commit

Probably complete coincidence but thought I'd post here in case it helps anyone else

bsl commented 2 years ago

I could hardly know less about this, but when I got the OOM, I hit Retry in the Activity tab and it succeeded. Maybe something is being drawn from cache on the second attempt?

jgirardet commented 2 years ago

Update - so I switched my .dockerignore to use an allowlist pattern just to ensure we're not copying over any unnecessary build context from wherever DO runs the docker build and the build completed fine (+1 additional follow-up build)

See this commit

Probably complete coincidence but thought I'd post here in case it helps anyone else

same fix as you, did the trick here

Ifletcher668 commented 2 years ago

I ran into this issue recently as well. To be honest, I feel like I tried every approach here, and none of them worked, and then I tried each of them again, and was finally able to push to DigitalOcean and have it succeed with out the OOM error.

Figured I would leave this here in case anyone else was having trouble and this miraculously worked for them, too.

Dockerfile ```Dockerfile FROM lukemathwalker/cargo-chef:latest-rust-1.59.0 as chef WORKDIR /app RUN apt update && apt install lld clang -y FROM chef as planner COPY . . # Compute a lock-like file for our project RUN cargo chef prepare --recipe-path recipe.json FROM chef as builder COPY --from=planner /app/recipe.json recipe.json # Build our project dependencies, not our application! RUN cargo chef cook --release --recipe-path recipe.json COPY . . ENV SQLX_OFFLINE true # Build our project RUN cargo build --release --bin zero2prod FROM debian:bullseye-slim AS runtime WORKDIR /app RUN apt-get update -y \ && apt-get install -y --no-install-recommends openssl ca-certificates \ # Clean up && apt-get autoremove -y \ && apt-get clean -y \ && rm -rf /var/lib/apt/lists/* COPY --from=builder /app/target/release/zero2prod zero2prod COPY config config ENV APP_ENVIRONMENT production ENTRYPOINT ["./zero2prod"] ```
Cargo.toml ```toml [package] name = "zero2prod" version = "0.1.0" edition = "2021" [lib] path = "src/lib.rs" [[bin]] path = "src/main.rs" name = "zero2prod" [dependencies] actix-web = "4.0.1" tokio = { version = "1", features = ["macros", "rt-multi-thread"] } serde = { version = "1", features = ["derive"]} serde-aux = "3" config = "0.11" uuid = { version = "0.8.1", features = ["v4"] } chrono = "0.4.15" tracing = { version = "0.1", features = ["log"] } tracing-log = "0.1" tracing-subscriber = { version = "0.3", features = ["registry", "env-filter"] } tracing-bunyan-formatter = "0.3" secrecy = { version = "0.8", features = ["serde"] } tracing-actix-web = "0.5" # tracing-error <- look into this [dependencies.sqlx] version = "0.5.7" default-features = false features = [ "runtime-actix-rustls", "macros", "postgres", "uuid", "chrono", "migrate", "offline" ] [dev-dependencies] reqwest = "0.11" once_cell = "1" ```