balena-os / balena-supervisor

Balena Supervisor: balena's agent on devices.
https://balena.io
Other
148 stars 63 forks source link

(HTTP code 409) conflict - conflict: unable to delete (cannot be forced) - image is being used by running container #841

Open willswire opened 5 years ago

willswire commented 5 years ago

When implementing the delete-then-download application update strategy, devices are unable to remove the existing container due to the following error. Confirmed supervisor version running >= v2.5.1.

10.12.18 14:46:53 (-0500) Killing service 'main' 10.12.18 14:46:53 (-0500) Deleting image 10.12.18 14:46:53 (-0500) Failed to delete image due to '(HTTP code 409) conflict - conflict: unable to delete (cannot be forced) - image is being used by running container'

willswire commented 5 years ago

It is possible to bypass this issue by first stopping the container, then pushing the update. The supervisor, under the delete-then-download strategy should do this rather than the user having to do so manually.

willswire commented 5 years ago

Same error on supervisor ver 9.0.1

17.01.19 12:12:52 (-0500) Killing service 'main sha256:08251b0aa8d6c66bd1b240ea4cf172963e81cbd8bac252fb144ba7eaaa0b41a0'
17.01.19 12:12:52 (-0500) Deleting image 'registry2.balena-cloud.com/v2/0f74bea475cd7a34f257da6be491baa8@sha256:d8b60a410da5ab912d725ed19eba5624e71359814dc09f9f83b8b71ba77fc98f'
17.01.19 12:12:52 (-0500) Failed to delete image 'registry2.balena-cloud.com/v2/0f74bea475cd7a34f257da6be491baa8@sha256:d8b60a410da5ab912d725ed19eba5624e71359814dc09f9f83b8b71ba77fc98f' due to '(HTTP code 409) conflict - conflict: unable to delete 08251b0aa8d6 (cannot be forced) - image is being used by running container c3bc0919b429 '
CameronDiver commented 5 years ago

Hey @willswire thanks for the report. I'll try to reproduce this soon and see what we can do. I imagine it's something like the supervisor not giving the container time to exit, so this would be where I'll start looking.

willswire commented 5 years ago

@CameronDiver thanks! If it helps at all, our current situation is:

The device will stay in a constant loop, reporting the same error.

CameronDiver commented 5 years ago

Thanks for the extra info;

Does the container catch and act upon signals, for example the SIGTERM that docker will send to ask a container to stop running?

I mean even if it does, this is still a bug, because the supervisor shouldn't be trying to remove the image until the container has stopped.

willswire commented 5 years ago

Any SIGTERM commands sent via the console, prior to initiating an update, are successful. Once the nightmarish update loop starts however, there's no response to any 'restart', 'stop' or 'start' commands.

CameronDiver commented 5 years ago

Hey @willswire sorry for the delay. I finally got some time to do some investigation here. I didn't manage to reproduce, but a colleague of mine did find a potential problem in the way that the state engine handles the delete-then-download strategy.

If possible, would you be able to try a new supervisor image which should fix this, or alternatively provide me with the source code for your project (and I'll try to dig out a device of the same type)?

The changes are implemented in this PR: https://github.com/balena-io/balena-supervisor/pull/893

willswire commented 5 years ago

@CameronDiver we can try the new supervisor image to test! How would we go about deploying the latest image to our machine?

CameronDiver commented 5 years ago

Thanks @willswire I'm pretty sure it should fix your issue (hence the closing) but finding out before release is certainly better.

The way that you could do this is to open a host OS terminal on your device and run update-resin-supervisor -t v9.7.1 -i balena/amd64-supervisor.

willswire commented 5 years ago

@CameronDiver thanks! The issue has been resolved.

CameronDiver commented 5 years ago

Really happy to hear :)

jellyfish-bot commented 2 years ago

[cywang117] This issue has attached support thread https://jel.ly.fish/e74a1106-b0eb-4f02-8f46-b78732db1ef9

cywang117 commented 1 year ago

For context, this error message is passed by the Supervisor from the Engine during updates, when an image in the current release needs to be deleted in favor of an image in the target release. The Supervisor should wait for containers to stop before attempting to remove images, but if a container fails to stop even with a balena kill, then this error may appear. Before commenting on or linking to this issue, please investigate if there are any processes in a service that fail to exit, even with a kill -9. If this error occurs in the absence of zombie user container processes, then that is a potential bug of the Supervisor.