Open lmbarros opened 2 years ago
[lmbarros] This issue has attached support thread https://jel.ly.fish/41b56e32-5fae-4a2e-b5bb-05f9f5af1f0f
I have an example of this issue here: https://github.com/machinemetrics/docker-socket
Another repro courtesy of @lmbarros: https://github.com/balena-io-playground/engine-on-container-socket-lost-test
Did a couple more quick tests:
SIGKILL
leaves the socket unusable in the container, as we already knew.SIGABRT
gives the same result as above. (This case might be of interest because that's what the watchdog sends on a timeout)SIGTERM
is fine, however: after the Engine restarts in the host, the socket becomes usable again in the container.I suspect this would be resolved by https://github.com/balena-os/balena-supervisor/pull/1780
I suspect this would be resolved by balena-os/balena-supervisor#1780
@klutchell Do you know if there's still a plan to get that fix in? If there's any way me and my team could help test this out this issue has been a real thorn in our side
Hey @deanMike, I have requested updates on the linked PR: https://github.com/balena-os/balena-supervisor/pull/1780
If we start a container with the label
io.balena.features.balena-socket: '1'
set, this container will have access to the Engine socket. However, if the Engine crashes on the Host OS, that container will no longer be able to connect to the Engine (even after the Engine restarts on the HostOS). Attempting to run Docker on the container will fail withThis can be easily reproduced by
SIGKILL
ingbalenad
on the Host OS and then trying to run Docker or balenaEngine on a container where it was previously working.This is arguably on the border between the Supervisor (that sets the mounts and shares up) and the Engine (that implements the mechanisms).