basecamp / kamal

Deploy web apps anywhere.
https://kamal-deploy.org
MIT License
9.38k stars 357 forks source link

Deploy stopped working since v1.7: `docker stderr: context "<name>" does not exist` #851

Closed edolix closed 1 week ago

edolix commented 2 weeks ago

Hi, thanks for this great package!

Today i tried to use the latest Kamal image and i got the error below. Looks like it's not able to inspect / create the docker context.

Logs from kamal deploy -d production --verbose using ghcr.io/basecamp/kamal:v1.7.0 but it happens with both v1.7.0 and v1.7.1.

Status: Downloaded newer image for ghcr.io/basecamp/kamal:v1.7.0
Log into image registry...

... removed for brevity ...

 DEBUG [f5a53d64]   Login Succeeded
  INFO [f5a53d64] Finished in 1.720 seconds with exit status 0 (successful).
Build and push app image...
  INFO [dbc88fd7] Running docker --version && docker buildx version on localhost
 DEBUG [dbc88fd7] Command: docker --version && docker buildx version
 DEBUG [dbc88fd7]   Docker version 20.10.24, build 297e1284d3bd092e9bc96076c3ddc4bb33f8c7ab
 DEBUG [dbc88fd7]   github.com/docker/buildx v0.15.0 d3a53189f7e9c917eeff851c895b9aad5a66b108
  INFO [dbc88fd7] Finished in 0.078 seconds with exit status 0 (successful).
  INFO Cloning repo into build directory `/tmp/kamal-clones/my-app-2f65914456263/workdir/`...
  INFO [08f28ec4] Running /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263 clone /workdir on localhost
 DEBUG [08f28ec4] Command: /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263 clone /workdir
 DEBUG [08f28ec4]   Cloning into 'workdir'...
 DEBUG [08f28ec4]   done.
  INFO [08f28ec4] Finished in 0.200 seconds with exit status 0 (successful).
  INFO [ecc25a58] Running /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263/workdir/ status --porcelain on localhost
 DEBUG [ecc25a58] Command: /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263/workdir/ status --porcelain
  INFO [ecc25a58] Finished in 0.011 seconds with exit status 0 (successful).
  INFO [d71a3f71] Running /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263/workdir/ rev-parse HEAD on localhost
 DEBUG [d71a3f71] Command: /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263/workdir/ rev-parse HEAD
 DEBUG [d71a3f71]   eec8108d97d18d6aff73362813289615131c86d7
  INFO [d71a3f71] Finished in 0.002 seconds with exit status 0 (successful).
  INFO [5bb8e984] Running docker context inspect kamal-my-app-native-remote-amd64 --format '{{.Endpoints.docker.Host}}' on localhost
 DEBUG [5bb8e984] Command: docker context inspect kamal-my-app-native-remote-amd64 --format '{{.Endpoints.docker.Host}}'
 DEBUG [5bb8e984]   context "kamal-my-app-native-remote-amd64" does not exist
 DEBUG [5bb8e984]
  WARN Missing compatible builder, so creating a new one first
  Finished all in 2.8 seconds
  ERROR (SSHKit::Command::Failed): docker exit status: 256
docker stdout: Nothing written
docker stderr: context "kamal-my-app-native-remote-amd64" does not exist

/usr/local/bundle/gems/sshkit-1.22.2/lib/sshkit/command.rb:97:in `exit_status='
/usr/local/bundle/gems/sshkit-1.22.2/lib/sshkit/backends/local.rb:59:in `block in execute_command'
/usr/local/lib/ruby/3.2.0/open3.rb:228:in `popen_run'
/usr/local/lib/ruby/3.2.0/open3.rb:103:in `popen3'
/usr/local/bundle/gems/sshkit-1.22.2/lib/sshkit/backends/local.rb:44:in `execute_command'
/usr/local/bundle/gems/sshkit-1.22.2/lib/sshkit/backends/abstract.rb:148:in `block in create_command_and_execute'
<internal:kernel>:90:in `tap'
/usr/local/bundle/gems/sshkit-1.22.2/lib/sshkit/backends/abstract.rb:148:in `create_command_and_execute'
/usr/local/bundle/gems/sshkit-1.22.2/lib/sshkit/backends/abstract.rb:66:in `capture'
/usr/local/bundle/gems/kamal-1.7.0/lib/kamal/sshkit_with_ext.rb:9:in `capture_with_info'
/usr/local/bundle/gems/kamal-1.7.0/lib/kamal/cli/build.rb:38:in `block in push'
/usr/local/bundle/gems/sshkit-1.22.2/lib/sshkit/backends/abstract.rb:31:in `instance_exec'
/usr/local/bundle/gems/sshkit-1.22.2/lib/sshkit/backends/abstract.rb:31:in `run'
/usr/local/bundle/gems/sshkit-1.22.2/lib/sshkit/dsl.rb:10:in `run_locally'
/usr/local/bundle/gems/kamal-1.7.0/lib/kamal/cli/build.rb:36:in `push'
/usr/local/bundle/gems/kamal-1.7.0/lib/kamal/cli/build.rb:8:in `deliver'
/usr/local/bundle/gems/thor-1.3.0/lib/thor/command.rb:28:in `run'
/usr/local/bundle/gems/thor-1.3.0/lib/thor/invocation.rb:127:in `invoke_command'
/usr/local/bundle/gems/thor-1.3.0/lib/thor.rb:527:in `dispatch'
/usr/local/bundle/gems/thor-1.3.0/lib/thor/invocation.rb:116:in `invoke'
/usr/local/bundle/gems/kamal-1.7.0/lib/kamal/cli/main.rb:35:in `block in deploy'
/usr/local/bundle/gems/kamal-1.7.0/lib/kamal/cli/base.rb:75:in `print_runtime'
/usr/local/bundle/gems/kamal-1.7.0/lib/kamal/cli/main.rb:24:in `deploy'
/usr/local/bundle/gems/thor-1.3.0/lib/thor/command.rb:28:in `run'
/usr/local/bundle/gems/thor-1.3.0/lib/thor/invocation.rb:127:in `invoke_command'
/usr/local/bundle/gems/thor-1.3.0/lib/thor.rb:527:in `dispatch'
/usr/local/bundle/gems/thor-1.3.0/lib/thor/base.rb:584:in `start'
/usr/local/bundle/gems/kamal-1.7.0/bin/kamal:9:in `<top (required)>'
/usr/local/bundle/bin/kamal:25:in `load'
/usr/local/bundle/bin/kamal:25:in `<main>'

The same command using v1.6.0 works.

Logs from kamal deploy -d production --verbose using ghcr.io/basecamp/kamal:v1.6.0.

Status: Downloaded newer image for ghcr.io/basecamp/kamal:v1.6.0
Log into image registry...

... removed for brevity ...

 DEBUG [f0cb5b8b]   Login Succeeded
  INFO [f0cb5b8b] Finished in 1.774 seconds with exit status 0 (successful).
Build and push app image...
  INFO [8796fd53] Running docker --version && docker buildx version on localhost
 DEBUG [8796fd53] Command: docker --version && docker buildx version
 DEBUG [8796fd53]   Docker version 20.10.24, build 297e1284d3bd092e9bc96076c3ddc4bb33f8c7ab
 DEBUG [8796fd53]   github.com/docker/buildx v0.14.1 59582a88fca7858dbe1886fd1556b2a0d79e43a3
  INFO [8796fd53] Finished in 0.083 seconds with exit status 0 (successful).
  INFO Cloning repo into build directory `/tmp/kamal-clones/my-app-2f65914456263/workdir/`...
  INFO [a6a7a86b] Running /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263 clone /workdir on localhost
 DEBUG [a6a7a86b] Command: /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263 clone /workdir
 DEBUG [a6a7a86b]   Cloning into 'workdir'...
 DEBUG [a6a7a86b]   done.
  INFO [a6a7a86b] Finished in 0.236 seconds with exit status 0 (successful).
  INFO [c2ff35c9] Running /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263/workdir/ status --porcelain on localhost
 DEBUG [c2ff35c9] Command: /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263/workdir/ status --porcelain
  INFO [c2ff35c9] Finished in 0.012 seconds with exit status 0 (successful).
  INFO [f3dbccb0] Running /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263/workdir/ rev-parse HEAD on localhost
 DEBUG [f3dbccb0] Command: /usr/bin/env git -C /tmp/kamal-clones/my-app-2f65914456263/workdir/ rev-parse HEAD
 DEBUG [f3dbccb0]   eec8108d97d18d6aff73362813289615131c86d7
  INFO [f3dbccb0] Finished in 0.002 seconds with exit status 0 (successful).
  INFO [16895a4f] Running docker buildx build --push --platform linux/amd64 --builder kamal-my-app-native-remote -t ghcr.io/edolix/my-app:eec8108d97d18d6aff73362813289615131c86d7 -t ghcr.io/edolix/my-app:latest-production --label service="my-app" --file Dockerfile . on localhost
 DEBUG [16895a4f] Command: docker buildx build --push --platform linux/amd64 --builder kamal-my-app-native-remote -t ghcr.io/edolix/my-app:eec8108d97d18d6aff73362813289615131c86d7 -t ghcr.io/edolix/my-app:latest-production --label service="my-app" --file Dockerfile .
 DEBUG [16895a4f]   ERROR: no builder "kamal-my-app-native-remote" found
  WARN Missing compatible builder, so creating a new one first
 DEBUG Using builder: native/remote
  INFO [77f58e4a] Running docker context create kamal-my-app-native-remote-amd64 --description 'kamal-my-app-native-remote amd64 native host' --docker 'host=' ; docker buildx create --name kamal-my-app-native-remote kamal-my-app-native-remote-amd64 --platform linux/amd64 on localhost
 DEBUG [77f58e4a] Command: docker context create kamal-my-app-native-remote-amd64 --description 'kamal-my-app-native-remote amd64 native host' --docker 'host=' ; docker buildx create --name kamal-my-app-native-remote kamal-my-app-native-remote-amd64 --platform linux/amd64
 DEBUG [77f58e4a]   kamal-my-app-native-remote-amd64
 DEBUG [77f58e4a]   Successfully created context "kamal-my-app-native-remote-amd64"
 DEBUG [77f58e4a]   kamal-my-app-native-remote
  INFO [77f58e4a] Finished in 0.066 seconds with exit status 0 (successful).
  INFO [4e267505] Running docker buildx build --push .....

... logs keep going till deploy succeeds

Context

I'm using the workaround described in https://github.com/basecamp/kamal/issues/809 so there's a dummy config/deploy.yml and a real config/deploy.production.yml with these settings:

deploy.production.yml ```yml service: my-app image: edolix/my-app servers: web: hosts: - my-app.egallo.dev labels: traefik.http.services.my-app-web-production.loadbalancer.server.port: "8000" traefik.docker.network: private traefik.http.routers.smart_track.rule: Host(`my-app.egallo.dev`) traefik.http.routers.smart_track.entrypoints: websecure traefik.http.routers.smart_track.tls.certresolver: letsencrypt traefik.http.routers.smart_track_secure.entrypoints: websecure traefik.http.routers.smart_track_secure.rule: Host(`my-app.egallo.dev`) traefik.http.routers.smart_track_secure.tls: true traefik.http.routers.smart_track_secure.tls.certresolver: letsencrypt options: "add-host": host.docker.internal:host-gateway network: "private" registry: server: ghcr.io username: - KAMAL_REGISTRY_USERNAME password: - KAMAL_REGISTRY_PASSWORD # Inject ENV variables into containers (secrets come from .env). # Remember to run `kamal env push` after making changes! env: clear: HOSTNAME: my-app.egallo.dev secret: - removed_for_brevity # Use a different ssh user than root ssh: user: ubuntu builder: remote: arch: amd64 healthcheck: path: /up port: 8000 accessories: db: image: postgres:16.0 roles: - web env: secret: - POSTGRES_PASSWORD directories: - data:/var/lib/postgresql/data options: network: "private" traefik: options: publish: - "443:443" volume: - "/letsencrypt/acme.json:/letsencrypt/acme.json" network: "private" args: accesslog: true accesslog.format: json log: true log.level: DEBUG entryPoints.web.address: ":80" entryPoints.websecure.address: ":443" entryPoints.web.http.redirections.entryPoint.to: websecure entryPoints.web.http.redirections.entryPoint.scheme: https entryPoints.web.http.redirections.entrypoint.permanent: true entrypoints.websecure.http.tls: true entrypoints.websecure.http.tls.domains[0].main: "my-app.egallo.dev" certificatesResolvers.letsencrypt.acme.email: "edo91.gallo@gmail.com" certificatesResolvers.letsencrypt.acme.storage: "/letsencrypt/acme.json" certificatesResolvers.letsencrypt.acme.httpchallenge: true certificatesResolvers.letsencrypt.acme.httpchallenge.entrypoint: web ```

Docker version

Client:
 Cloud integration: v1.0.35+desktop.13
 Version:           26.0.0
 API version:       1.45
 Go version:        go1.21.8
 Git commit:        2ae903e
 Built:             Wed Mar 20 15:14:46 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.29.0 (145265)
 Engine:
  Version:          26.0.0
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.8
  Git commit:       8b79278
  Built:            Wed Mar 20 15:18:02 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Still digging through trying to understand where could be the issue but the ticket might be helpful for others.

Thanks again!

edolix commented 2 weeks ago

Could it be around these lines:

https://github.com/basecamp/kamal/blob/4697f894411af5f6e245c15c84b5073bc48edd04/lib/kamal/cli/build.rb#L45-L47

the error message from v1.7 is context "kamal-my-app-native-remote-amd64" does not exist while in v1.6.0 was ERROR: no builder "kamal-smart-track-native-remote" found.

djmb commented 1 week ago

Hmm interesting, when I run docker context inspect kamal-my-app-native-remote-amd64 --format '{{.Endpoints.docker.Host}}' which doesn't exist, I get this error message:

context "kamal-my-app-native-remote-amd64": context not found: open <snip>/.docker/contexts/meta/ad3cdf99c2f765ec10c20f6e8d60aac5f39e063f514574d5863c922f20ec6216/meta.json: no such file or directory

So I'd be interested to know why you are getting a different error message. In any case let's update the matcher to include does not exist.

edolix commented 1 week ago

Looks like the does not exist error message is coming from the cli.remove call.

It will run docker context rm kamal-app-native-remote-amd64; docker buildx rm kamal-app-native-remote where docker context rm returns exactly context "kamal-app-native-remote-amd64" does not exist.

I don't understand why this WARN line before cli.remove didn't show up in the logs tho. Is the output overridden by the docker error message?

djmb commented 1 week ago

@edolix - the stacktrace is from line 38 of build.rb, so it looks like it is from the docker inspect command. I've released v1.7.2 with a fix for this - could you confirm if that's worked?

edolix commented 1 week ago

@djmb using v1.7.2 it works!

.....
  INFO [2d3d5f41] Running docker context inspect kamal-my-app-native-remote-amd64 --format '{{.Endpoints.docker.Host}}' on localhost
 DEBUG [2d3d5f41] Command: docker context inspect kamal-my-app-native-remote-amd64 --format '{{.Endpoints.docker.Host}}'
 DEBUG [2d3d5f41]
 DEBUG [2d3d5f41]   context "kamal-my-app-native-remote-amd64" does not exist
  WARN Missing compatible builder, so creating a new one first
 DEBUG Using builder: native/remote
  INFO [823f5d0a] Running docker context create kamal-my-app-native-remote-amd64 --description 'kamal-my-app-native-remote amd64 native host' --docker 'host=' ; docker buildx create --name kamal-my-app-native-remote kamal-my-app-native-remote-amd64 --platform linux/amd64 on localhost
 DEBUG [823f5d0a] Command: docker context create kamal-my-app-native-remote-amd64 --description 'kamal-my-app-native-remote amd64 native host' --docker 'host=' ; docker buildx create --name kamal-my-app-native-remote kamal-my-app-native-remote-amd64 --platform linux/amd64
 DEBUG [823f5d0a]   kamal-my-app-native-remote-amd64
 DEBUG [823f5d0a]   Successfully created context "kamal-my-app-native-remote-amd64"
 DEBUG [823f5d0a]   kamal-my-app-native-remote
...

You're right about the stacktrace line but somehow i can't reproduce the error message w/ the docker inspect command locally:

Running "inspect"


docker context inspect kamal-foo-bar-not-exist --format '{{.Endpoints.docker.Host}}'                                                                                                                                                                                                    

context "kamal-foo-bar-not-exist": context not found: open /.docker/contexts/meta/ee6ba30b4b07d1cb1c53e0938340ca79f88ff702df1c18b945a932255ff18e33/meta.json: no such file or directory


> Running "rm"

docker context rm kamal-foo-bar-not-exist

context "kamal-foo-bar-not-exist" does not exist



The problem is solved but i'm curious about the error message, i'll dig a little bit more. Thanks for the fix and help! 🙏 
plattenschieber commented 1 week ago

I had the same issue, but didn't investigate further. I thought it might have to do something with the multiarch build (as this made it work again) and my newly setup server with Ubuntu 22.04 instead of doing builds on my M2.