Open willswire opened 5 years ago
It is possible to bypass this issue by first stopping the container, then pushing the update. The supervisor, under the delete-then-download strategy should do this rather than the user having to do so manually.
Same error on supervisor ver 9.0.1
17.01.19 12:12:52 (-0500) Killing service 'main sha256:08251b0aa8d6c66bd1b240ea4cf172963e81cbd8bac252fb144ba7eaaa0b41a0'
17.01.19 12:12:52 (-0500) Deleting image 'registry2.balena-cloud.com/v2/0f74bea475cd7a34f257da6be491baa8@sha256:d8b60a410da5ab912d725ed19eba5624e71359814dc09f9f83b8b71ba77fc98f'
17.01.19 12:12:52 (-0500) Failed to delete image 'registry2.balena-cloud.com/v2/0f74bea475cd7a34f257da6be491baa8@sha256:d8b60a410da5ab912d725ed19eba5624e71359814dc09f9f83b8b71ba77fc98f' due to '(HTTP code 409) conflict - conflict: unable to delete 08251b0aa8d6 (cannot be forced) - image is being used by running container c3bc0919b429 '
Hey @willswire thanks for the report. I'll try to reproduce this soon and see what we can do. I imagine it's something like the supervisor not giving the container time to exit, so this would be where I'll start looking.
@CameronDiver thanks! If it helps at all, our current situation is:
The device will stay in a constant loop, reporting the same error.
Thanks for the extra info;
Does the container catch and act upon signals, for example the SIGTERM
that docker will send to ask a container to stop running?
I mean even if it does, this is still a bug, because the supervisor shouldn't be trying to remove the image until the container has stopped.
Any SIGTERM
commands sent via the console, prior to initiating an update, are successful. Once the nightmarish update loop starts however, there's no response to any 'restart', 'stop' or 'start' commands.
Hey @willswire sorry for the delay. I finally got some time to do some investigation here. I didn't manage to reproduce, but a colleague of mine did find a potential problem in the way that the state engine handles the delete-then-download
strategy.
If possible, would you be able to try a new supervisor image which should fix this, or alternatively provide me with the source code for your project (and I'll try to dig out a device of the same type)?
The changes are implemented in this PR: https://github.com/balena-io/balena-supervisor/pull/893
@CameronDiver we can try the new supervisor image to test! How would we go about deploying the latest image to our machine?
Thanks @willswire I'm pretty sure it should fix your issue (hence the closing) but finding out before release is certainly better.
The way that you could do this is to open a host OS terminal on your device and run update-resin-supervisor -t v9.7.1 -i balena/amd64-supervisor
.
@CameronDiver thanks! The issue has been resolved.
Really happy to hear :)
[cywang117] This issue has attached support thread https://jel.ly.fish/e74a1106-b0eb-4f02-8f46-b78732db1ef9
For context, this error message is passed by the Supervisor from the Engine during updates, when an image in the current release needs to be deleted in favor of an image in the target release. The Supervisor should wait for containers to stop before attempting to remove images, but if a container fails to stop even with a balena kill
, then this error may appear. Before commenting on or linking to this issue, please investigate if there are any processes in a service that fail to exit, even with a kill -9
. If this error occurs in the absence of zombie user container processes, then that is a potential bug of the Supervisor.
When implementing the delete-then-download application update strategy, devices are unable to remove the existing container due to the following error. Confirmed supervisor version running >= v2.5.1.
10.12.18 14:46:53 (-0500) Killing service 'main'
10.12.18 14:46:53 (-0500) Deleting image
10.12.18 14:46:53 (-0500) Failed to delete image due to '(HTTP code 409) conflict - conflict: unable to delete (cannot be forced) - image is being used by running container'