buildkite-plugins / docker-compose-buildkite-plugin

🐳⚡️ Run build scripts, and build + push images, w/ Docker Compose
MIT License
171 stars 138 forks source link

Docker containers not stopped when canceling through `cancel_on_build_failing` #423

Closed boboldehampsink closed 6 months ago

boboldehampsink commented 6 months ago

Default behavior:

Screenshot 2024-02-09 at 14 11 08

Pre-exit hooks is run and containers are killed.

When canceling through cancel_on_build_failing:

Screenshot 2024-02-09 at 14 11 18

Nothing is killed.

May be related to #389?

toote commented 6 months ago

From the looks of it, if the job does not handle the cancellation signal on time (within its grace period), the whole thing is killed so the job no longer exists to run pre-exit hooks. I don't have that much knowledge in go, but from the looks of it in the code it appears that the agent kills the actual job's process (that is actually the agent's bootstrap script in charge of running the hooks). I don't see a way to avoid that without heavy modifications of the agent.

With that said, the main issue is that the job does not cancel cleanly. Based on the fact that you are seeing the Signal received, stopping container log line but not the next one, it would indicate that running docker stop on your container is not doing anything or taking too long. Obviously, the code applied in the plugin to make signal handling better is indeed working, but that does not help too much if the containers you are running do not stop on time :(

I don't think that there is anything that can be done in the plugin to prevent or mitigate this situation as long as the containers being run do not handle the stopping correctly :( But if you have any suggestions or ideas as to how to make the plugin better we are all ears!

boboldehampsink commented 6 months ago

Thanks for the explanation. I'm going to have a go with an increased cancel grace period - see if that fixes it.