basecamp / kamal

Deploy web apps anywhere.
https://kamal-deploy.org
MIT License
11.67k stars 471 forks source link

Kamal does not clean up the container for a "server" removed from configuration #1187

Open davidstosik opened 3 weeks ago

davidstosik commented 3 weeks ago

How to reproduce

  1. Start with a working Kamal deploy.yml file with two servers with different roles. For example web and job (documented here):
    servers:
      web:
        - 65.108.85.64
      job:
        hosts:
          - 65.108.85.64
        cmd: bin/jobs
  2. Deploy the application and confirm that there is a container running for each role. (eg. kamal app containers).
  3. Delete one of the servers from deploy.yml. (For example the job one.)
  4. Deploy the application again with Kamal.
  5. Check the running containers again, and observe that the server/role that was deleted from the deploy file still has a container running (from the first deploy in step 2).

What I observed

I can see a -web container ("Up 3 minutes", the deploy in step 4), and a -job container ("Up 5 minutes", the deploy in step 2).

What I was expecting

I was expecting the last deploy to at least stop the job container that was left from an earlier deploy but unnecessary, then I would only observe a web container running.

Comments

Since I removed the role from the deploy.yml file, I am unable to stop or cleanup that container: kamal app stop --roles=job fails with "No --roles match for job".

This seems to be a similar problem as with accessories, where if the accessory is not declared in the deploy.yml file, Kamal is unable to delete it.

However, since Kamal can list all containers and observe that a myapp-web- and a myapp-job- containers are running, wouldn't it be able to decide the myapp-job- container needs to be stopped?

The only workaround I found so far was to check out an earlier commit, where the container is still declared in deploy.yml, and remove it from there:

git checkout HEAD~1
kamal app remove --roles=job
git checkout -
SteffanPerry commented 3 weeks ago

I noticed this as well. One thing that needs to be considered is multiple apps deploying to a single server. If we simply stop and remove the container if is is no longer in the YML, it could accidentally remove a container from another application if somehow the app has the same application name. (probably rare)

For now Im just ssh'ing into the servers and manually stopping/remove the containers. this is kind of a corner case at the moment as i don't think many people will be constantly adding and removing containers. but it would be nice if removing it from the yml also deleted it (terraform style)

davidstosik commented 3 weeks ago

I don't think Kamal's designed to manage multiple apps with the same name on the same server. Still, your point stands: what if I have one app named my_app with a job role, and another app named my_app-job? The names would clash and there would be a risk of removing incorrect resources.

Feels like Kamal should leave a copy on the server(s) of the deploy.yml file used to deploy the app, so it can find what the application looked like when it was deployed.

djmb commented 5 days ago

Right now you need to call something like kamal app remove -h 65.108.85.64 -r job before changing the config.

Terraform style state management would add a huge amount of complexity to Kamal, so I don't think we'll ever do there. But maybe there is a need for a cleanup command or something like that, that finds unwanted containers?

Images and containers are tagged with the service label, so it should be possible to find only those ones related to the current app even when multiples apps are deployed.