Closed raylu closed 1 year ago
@raylu your situation to need this is very weird as the main container should have stopped before it gets to that point.
First, when the job was cancelled as the agent running the job will send the whole process stack a stopping and then a termination signal. The main container should have received that and stopped itself then. If that was not enough, the code just above the lines you added should have also taken care of it by killing all containers associated to the project itself (the main container is part of the project as well). There is something definitely fishy going on that we've never seen before that I don't think should need special accommodation from the plugin, but if you create an issue in the repository we'll be more than happy to help you troubleshoot.
with this new code, I'm getting
$ docker compose -f docker-compose.buildkite.yml -p buildkite0188080e66054253ab6ce4b3e3877dec down --remove-orphans --volumes
--
| [+] Running 0/1
| ⠿ Container buildkite0188080e66054253ab6ce4b3e3877dec_tests_build_287311 Error while Removing 0.7s
| Error response from daemon: removal of container 2cbf7301d12f1cac79ca2e69da1d494d7b2d1c2863222487b3b5470d1695d455 is already in progress
I'll report back if it actually worked in a bit. please hold off on merging
I'll report back if it actually worked in a bit. please hold off on merging
I read your message after merging :( I have replied to your findings in the related Issue so that we can continue the discussion there and I'll hold-off on releasing the merged changes until we are sure it is working properly
no worries! my fault for pushing here before testing
it appears to basically work. of our 100 test shards, 99 didn't send any reports. 1 of them did. it logged
Cleaning up after docker-compose | 10s
-- | --
| $ docker compose -f docker-compose.buildkite.yml -p buildkite0188080e6624499ebe42c0980be563da kill
| [+] Running 0/5
| ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedA]-1 Killing 9.6s
| ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedB]-1 Killing 9.6s
| ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedC]-1 Killing 9.6s
| ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedD]-1 Killing 9.6s
| ⠴ Container buildkite0188080e6624499ebe42c0980be563da-[redactedE]-1 Killing 9.6s
and no further logs (so it never got to down --remove-orphans
). this is similar to what I saw with my old version of the code using stop $(docker ps --filter=...)
: most shards stopped the container but some leaked through. this is still a big improvement, so I'm looking forward to a release
When I cancel a job, the main/run container continues running (but the output doesn't show up in the web interface). See #389.