I'm trying to fix an issue that I believe is related to PR #2860. In production I have a swarm of four docker nodes, each running a large number of containers. Restarting docker on one of the nodes can take a while (often more than a minute). Sometimes this causes swarm to stop sending new requests to that worker. I'm not sure if the swarm process would ever re-schedule work to that node, typically when this happens I've restarted the swarm process to force the swarm manager to get back into a good state. When this error happens, docker info on the swarm manager indicates that swarm thinks the host has containers scheduled to it, when in fact the host has no containers scheduled. When this happens I see a log entry with the text "Engine came back to life..." which was added in PR #2860. My guess is this happens 1 in 10 times that I restart the docker daemon on a node.
From my non-expert look at the code, I think the issue is that the return statement in the current code is causing e.RefreshContainers() not to be run a few lines below. I've tested this change in production and the problem I was experiencing doesn't seem to recur: but since this issue was only happening sporadically even before, this is inconclusive. Does this change look right?
I just realized this is a no-op due to the fact that Go doesn't have implicit fallthrough. We are still experiencing a problem related to this code path, however.
I'm trying to fix an issue that I believe is related to PR #2860. In production I have a swarm of four docker nodes, each running a large number of containers. Restarting docker on one of the nodes can take a while (often more than a minute). Sometimes this causes swarm to stop sending new requests to that worker. I'm not sure if the swarm process would ever re-schedule work to that node, typically when this happens I've restarted the swarm process to force the swarm manager to get back into a good state. When this error happens,
docker info
on the swarm manager indicates that swarm thinks the host has containers scheduled to it, when in fact the host has no containers scheduled. When this happens I see a log entry with the text "Engine came back to life..." which was added in PR #2860. My guess is this happens 1 in 10 times that I restart the docker daemon on a node.From my non-expert look at the code, I think the issue is that the
return
statement in the current code is causinge.RefreshContainers()
not to be run a few lines below. I've tested this change in production and the problem I was experiencing doesn't seem to recur: but since this issue was only happening sporadically even before, this is inconclusive. Does this change look right?