caddyserver / caddy-docker

Source for the official Caddy v2 Docker Image
https://hub.docker.com/_/caddy
Apache License 2.0
398 stars 73 forks source link

Docker images for arm64 are not working after 2.8.1 #359

Closed milanaleksic closed 1 month ago

milanaleksic commented 1 month ago

A build of a container image failed on my side like this:

build failure logs

``` homelab/docker/caddy-cloudflare on  master [$!] via 🐳 orbstack via ❄️ (nix-shell-env) took 1m35s ➜ ./build.sh [+] Building 1.8s (19/20) docker-container:multiarch => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 649B 0.0s => [linux/arm64 internal] load metadata for docker.io/library/caddy:2.8.4 1.4s => [linux/amd64 internal] load metadata for docker.io/library/caddy:2.8.4 1.1s => [linux/amd64 internal] load metadata for docker.io/library/caddy:builder 1.1s => [linux/arm64 internal] load metadata for docker.io/library/caddy:builder 1.1s => [auth] library/caddy:pull token for registry-1.docker.io 0.0s => [linux/amd64 builder 1/2] FROM docker.io/library/caddy:builder@sha256:3db37b69486800de84befa0dd692b4023beb435421703bbdb6f84a8055ddf011 0.1s => => resolve docker.io/library/caddy:builder@sha256:3db37b69486800de84befa0dd692b4023beb435421703bbdb6f84a8055ddf011 0.1s => [internal] load build context 0.0s => => transferring context: 39B 0.0s => [linux/amd64 stage-1 1/4] FROM docker.io/library/caddy:2.8.4@sha256:f5af55ebb433cb652b27f5b5fc5732853bcb00afeddedb9579be6a10c9a42b1c 0.2s => => resolve docker.io/library/caddy:2.8.4@sha256:f5af55ebb433cb652b27f5b5fc5732853bcb00afeddedb9579be6a10c9a42b1c 0.2s => [linux/arm64 stage-1 1/4] FROM docker.io/library/caddy:2.8.4@sha256:f5af55ebb433cb652b27f5b5fc5732853bcb00afeddedb9579be6a10c9a42b1c 0.2s => => resolve docker.io/library/caddy:2.8.4@sha256:f5af55ebb433cb652b27f5b5fc5732853bcb00afeddedb9579be6a10c9a42b1c 0.2s => [linux/arm64 builder 1/2] FROM docker.io/library/caddy:builder@sha256:3db37b69486800de84befa0dd692b4023beb435421703bbdb6f84a8055ddf011 0.2s => => resolve docker.io/library/caddy:builder@sha256:3db37b69486800de84befa0dd692b4023beb435421703bbdb6f84a8055ddf011 0.1s => CACHED [linux/amd64 builder 2/2] RUN caddy-builder github.com/caddy-dns/cloudflare 0.0s => CACHED [linux/amd64 stage-1 2/4] COPY --from=builder /usr/bin/caddy /usr/bin/caddy 0.0s => CACHED [linux/amd64 stage-1 3/4] RUN apk add --no-cache tini 0.0s => CACHED [linux/amd64 stage-1 4/4] COPY signal-handler.sh / 0.0s => CACHED [linux/arm64 builder 2/2] RUN caddy-builder github.com/caddy-dns/cloudflare 0.0s => CACHED [linux/arm64 stage-1 2/4] COPY --from=builder /usr/bin/caddy /usr/bin/caddy 0.0s => ERROR [linux/arm64 stage-1 3/4] RUN apk add --no-cache tini 0.1s ------ > [linux/arm64 stage-1 3/4] RUN apk add --no-cache tini: 0.103 .buildkit_qemu_emulator: /bin/sh: Invalid ELF image for this architecture ------ Dockerfile:16 -------------------- 14 | # 1. implementation: https://github.com/optiz0r/caddy-consul/ 15 | # Override the entrypoint with a bash script which handles SIGHUP and triggers reload 16 | >>> RUN apk add --no-cache tini 17 | COPY signal-handler.sh / 18 | ENTRYPOINT ["/sbin/tini", "--"] -------------------- ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c apk add --no-cache tini" did not complete successfully: exit code: 255 ```

My Dockerfile

``` ARG CADDY_VERSION=0.0.0 FROM caddy:builder AS builder RUN caddy-builder \ github.com/caddy-dns/cloudflare FROM caddy:${CADDY_VERSION} COPY --from=builder /usr/bin/caddy /usr/bin/caddy # Reference: # 1. discussion: https://github.com/caddyserver/caddy/issues/3967 # 1. implementation: https://github.com/optiz0r/caddy-consul/ # Override the entrypoint with a bash script which handles SIGHUP and triggers reload RUN apk add --no-cache tini COPY signal-handler.sh / ENTRYPOINT ["/sbin/tini", "--"] CMD ["/signal-handler.sh", "caddy", "run", "--config", "/etc/caddy/Caddyfile", "--adapter", "caddyfile"] ```

I found this strange since if I run with arg version set to 2.8.1, the build works flawlessly. There must be something that broke (either my setup, or the image is wrong).

For a verification, I went to my arm64 (RPi5) and here are my logs from there, only using official caddy docker image

# this works perfectly!
milan@oberon ~ → docker run --rm -ti --entrypoint /bin/sh caddy:2.8.1
Unable to find image 'caddy:2.8.1' locally
2.8.1: Pulling from library/caddy
94747bd81234: Pull complete
d679b063c3cc: Pull complete
9d3036766387: Pull complete
553446b932e7: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:7414db60780a20966cd9621d1dcffcdcef060607ff32ddbfde2a3737405846c4
Status: Downloaded newer image for caddy:2.8.1
/srv #

# this doesn't work, it's a SIGSEGV
milan@oberon ~ → docker run --rm -ti --entrypoint /bin/sh caddy:2.8.4
milan@oberon ~ → echo $?
139

I am not an expert, but I assume the arm64 wasn't labeled correctly and is actually based of some other architecture.

icco commented 1 month ago

I am also seeing this. Example error from our dev servers

crash.txt

francislavoie commented 1 month ago

/cc @tianon do you have any clue how we could debug this? I'm not sure where to look for this kind of issue.

francislavoie commented 1 month ago

@milanaleksic looking closer at your Dockerfile, please try removing the tini & signal handling stuff, switch from caddy-builder (which is deprecated and will be removed at some point), use xcaddy instead. Use this Dockerfile as a base: https://caddyserver.com/docs/build#docker this would let us reduce the variables here to make sure it's not any of those bits that are causing a problem.

milanaleksic commented 1 month ago

I understand that you'd like me to simplify the dockerfile, but in the second half of my issue description I found what I believe the root cause: just run the default 2.8.4 image on an arm64 system and you will see the issue; no need to build anything

icco commented 1 month ago

Yeah, this was clearly an issue in the building and releasing of the image. The built binaries in the arm images are for x86

francislavoie commented 1 month ago

😣 the thing is, we didn't change anything in our build pipeline between 2.8.1 and 2.8.4, so I don't understand what could have gone wrong.

Could you try the binary from the Caddy release on GitHub https://github.com/caddyserver/caddy/releases/tag/v2.8.4 (grab the tar for arm, untar it) see if it runs on your VM? The Docker build essentially just grabs this binary and validates the checksum matches. If this works or doesn't work it would help eliminate where in the pipeline something could've gone wrong.

milanaleksic commented 1 month ago

If I download https://github.com/caddyserver/caddy/releases/download/v2.8.4/caddy_2.8.4_linux_arm64.tar.gz I can see it is a correct binary

milan@oberon ~/temp → ls -altr caddy
-rwxr-xr-x 1 milan sysadmin 39387288 Jun  2 14:12 caddy

milan@oberon ~/temp → file caddy
caddy: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=tZKoqa6cBHpl2lF0HNGp/Dh6bf4GK6AsN7QVP8_6o/E18C2czXiRuFCPfsn2hf/-n78kBK4tdFousYl003K, stripped
hairyhenderson commented 1 month ago

hrm... it works on my machine (MacBook Pro M3 Pro with Docker Desktop v4.30.0):

$ docker run --rm -ti --entrypoint /bin/sh caddy:2.8.4
Unable to find image 'caddy:2.8.4' locally
2.8.4: Pulling from library/caddy
4f4fb700ef54: Already exists
94747bd81234: Download complete
6ed74f46772d: Download complete
d8adbb5a5ba3: Download complete
9edd6440eb39: Download complete
Digest: sha256:4718355ff1e2592290e49950f01fb1d4b75adb920a7695aedd94b6a4590a684b
Status: Downloaded newer image for caddy:2.8.4
/srv #

@milanaleksic can you confirm the image digest that you're using?

hairyhenderson commented 1 month ago

For completeness, I also tried on my Raspberry Pi 5:

$ docker run --rm -ti caddy:2.8.4
Unable to find image 'caddy:2.8.4' locally
2.8.4: Pulling from library/caddy
94747bd81234: Pull complete
9edd6440eb39: Pull complete
6ed74f46772d: Pull complete
d8adbb5a5ba3: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:4718355ff1e2592290e49950f01fb1d4b75adb920a7695aedd94b6a4590a684b
Status: Downloaded newer image for caddy:2.8.4
2024/06/05 12:52:28.587 INFO    using config from file  {"file": "/etc/caddy/Caddyfile"}
...

(it's fine there too)

There was a rebuild a few hours ago, maybe there was some intermittent issue...

hairyhenderson commented 1 month ago

Yeah, this was clearly an issue in the building and releasing of the image. The built binaries in the arm images are for x86

@icco can you elaborate? paste some logs/etc?

icco commented 1 month ago

So some interesting things I've seen:

[ Wed Jun 05 12:56:33 ]
[ nat@Nat-LRL ~ ]$ docker run --rm -ti --entrypoint /bin/sh caddy:2.8.4
/srv # apk --print-arch
aarch64
/srv #

[ Wed Jun 05 12:56:42 ]
[ nat@Nat-LRL ~ ]$ docker run --rm -ti --entrypoint /bin/sh caddy:2.8.4-builder
/usr/bin # apk --print-arch
armv7

Same machine, would expect both of those to have the same arch.

$ docker run --rm -ti --entrypoint /bin/sh caddy:2.8.4
/srv # apk add file
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/aarch64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/aarch64/APKINDEX.tar.gz
(1/2) Installing libmagic (5.45-r1)
(2/2) Installing file (5.45-r1)
Executing busybox-1.36.1-r28.trigger
OK: 18 MiB in 23 packages
/srv # file /usr/bin/caddy
/usr/bin/caddy: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=tZKoqa6cBHpl2lF0HNGp/Dh6bf4GK6AsN7QVP8_6o/E18C2czXiRuFCPfsn2hf/-n78kBK4tdFousYl003K, stripped

File output matches downloaded file

The above was all done on an M2 mac. I'll be able to post outputs from our ARM VMs when I get to work, but the biggest surprise so far to me is that the arch of these images are different.

Also, attached are the outputs of the manifests

$ docker manifest inspect caddy:2.8.4 > ~/Desktop/284.maifest.json
$ docker manifest inspect caddy:2.8.4-builder > ~/Desktop/284-builder.maifest.json

284.maifest.json 284-builder.maifest.json

milanaleksic commented 1 month ago

SInce at this time all the scripts on my side work correctly (after flushing image cache and re-downloading the same image) and I can no longer reproduce issue, the new builds probably fixed the issue and I can close the task at this time.

milan@oberon ~/temp → docker rmi caddy:2.8.4
Untagged: caddy:2.8.4
Untagged: caddy@sha256:f5af55ebb433cb652b27f5b5fc5732853bcb00afeddedb9579be6a10c9a42b1c
Deleted: sha256:d03a1673034f8209e77bd2c19e591b4feb4ebae6b72397a9f294cc1c84ce8983
Deleted: sha256:f30de503be553ffaf8e8dcb78f38ae4583b66a43e0f45c669719f80ca319255f
Deleted: sha256:5ab0f0156241ecf4360f3717001854774bf72b5dc77e42600984e46f7d5164bd
Deleted: sha256:6a29654a428bd5f6401cd5e543c42f8cf1b4c340b36dffdcd1b4b392bbee0ca3
Deleted: sha256:3ba7198bff30e4a1edb027586b022ce3e56cad74da2ab217b9f8676b3574018d
milan@oberon ~/temp → docker run --rm -ti --entrypoint /bin/sh caddy:2.8.4
Unable to find image 'caddy:2.8.4' locally
2.8.4: Pulling from library/caddy
94747bd81234: Already exists
9edd6440eb39: Pull complete
6ed74f46772d: Pull complete
d8adbb5a5ba3: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:4718355ff1e2592290e49950f01fb1d4b75adb920a7695aedd94b6a4590a684b
Status: Downloaded newer image for caddy:2.8.4
/srv #