buildkite-plugins / docker-compose-buildkite-plugin

🐳⚡️ Run build scripts, and build + push images, w/ Docker Compose
MIT License
172 stars 140 forks source link

Remove orphaned containers in cleanup #386

Closed raylu closed 1 year ago

raylu commented 1 year ago

When I cancel a job, the main/run container continues running (but the output doesn't show up in the web interface). See #389.

toote commented 1 year ago

@raylu your situation to need this is very weird as the main container should have stopped before it gets to that point.

First, when the job was cancelled as the agent running the job will send the whole process stack a stopping and then a termination signal. The main container should have received that and stopped itself then. If that was not enough, the code just above the lines you added should have also taken care of it by killing all containers associated to the project itself (the main container is part of the project as well). There is something definitely fishy going on that we've never seen before that I don't think should need special accommodation from the plugin, but if you create an issue in the repository we'll be more than happy to help you troubleshoot.

raylu commented 1 year ago

with this new code, I'm getting

$ docker compose -f docker-compose.buildkite.yml -p buildkite0188080e66054253ab6ce4b3e3877dec down --remove-orphans --volumes
--
  | [+] Running 0/1
  | ⠿ Container buildkite0188080e66054253ab6ce4b3e3877dec_tests_build_287311  Error while Removing 0.7s
  | Error response from daemon: removal of container 2cbf7301d12f1cac79ca2e69da1d494d7b2d1c2863222487b3b5470d1695d455 is already in progress

I'll report back if it actually worked in a bit. please hold off on merging

toote commented 1 year ago

I'll report back if it actually worked in a bit. please hold off on merging

I read your message after merging :( I have replied to your findings in the related Issue so that we can continue the discussion there and I'll hold-off on releasing the merged changes until we are sure it is working properly

raylu commented 1 year ago

no worries! my fault for pushing here before testing

it appears to basically work. of our 100 test shards, 99 didn't send any reports. 1 of them did. it logged

Cleaning up after docker-compose | 10s
-- | --
  | $ docker compose -f docker-compose.buildkite.yml -p buildkite0188080e6624499ebe42c0980be563da kill
  | [+] Running 0/5
  | ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedA]-1  Killing 9.6s
  | ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedB]-1  Killing 9.6s
  | ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedC]-1  Killing 9.6s
  | ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedD]-1  Killing 9.6s
  | ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedE]-1  Killing 9.6s

and no further logs (so it never got to down --remove-orphans). this is similar to what I saw with my old version of the code using stop $(docker ps --filter=...): most shards stopped the container but some leaked through. this is still a big improvement, so I'm looking forward to a release