Open akalipetis opened 6 years ago
ping @mlaventure PTAL
This was a production cluster, so I couldn't switch to debug mode and did a reboot to make it go back to normal.
I could see the process running within Docker, but there was no related PID in the system and seems like containers did not have a process either.
I don't have a good way to reproduce this unfortunately. The system was under memory stress before this happened, so this might be related.
The system was under memory stress before this happened, so this might be related.
Could be that the process was OOM killed by the kernel
Docker should have still received the message if it was an OOM kill.
Without debug logs it's hard to do an educated guess. But maybe we should update the daemon to automatically remove a container from the running list if it gets a "not found" from containerd
.
But maybe we should update the daemon to automatically remove a container from the running list if it gets a "not found" from containerd.
I believe this is pretty safe and would be fine as a first step, given that this is not something happening quite commonly.
Could this be related to https://github.com/moby/moby/pull/36173? We see that problem (which manifests itself as dockerd
being unable to communicate with containerd
) and it's closely associated with OOM-killing events.
Expected behavior
Exec-ing into a running container should always work
Actual behavior
Steps to reproduce the behavior
I'm running a single-node 17.10.0-ce Docker Swarm cluster. After a service restarted, a new container was spawned but it was never in sync with containerD.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.)
Ubuntu 16.04 on Digital Ocean.