I was debugging issue when cadvisor did not started properly. I found out it's because of non-responsive containers. What I experienced was service with 4/1 replicas (ie 3 more then it should be). Containers are running on docker nodes be it managers/workers. But they did not responded to docker inspect command. I removed the service, but containers are still running from perspective of docker engine. As they are hanged up - docker prune is also failing on these nodes. I guess I cannot stop these unresponsive containers via docker commands .
From what I understand - one way to is to kill these processes via PID of the host - but I am unable to get to the host as I am in SSH container. I can kill ec2 instance, but I would rather use it as last option.
So my question, is there other way how to deal with unresponsive containers then kill whole instance?
Thanks
Server Version: 17.12.0-ce
~ $ docker-diagnose
OK hostname=ip-172-31-27-122-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-18-225-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-45-242-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-0-27-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-8-217-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-27-205-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-32-97-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
Done requesting diagnostics.
Your diagnostics session ID is 1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
Please provide this session ID to the maintainer debugging your issue.
This is kind of issue.
I was debugging issue when cadvisor did not started properly. I found out it's because of non-responsive containers. What I experienced was service with 4/1 replicas (ie 3 more then it should be). Containers are running on docker nodes be it managers/workers. But they did not responded to docker inspect command. I removed the service, but containers are still running from perspective of docker engine. As they are hanged up - docker prune is also failing on these nodes. I guess I cannot stop these unresponsive containers via docker commands .
From what I understand - one way to is to kill these processes via PID of the host - but I am unable to get to the host as I am in SSH container. I can kill ec2 instance, but I would rather use it as last option.
So my question, is there other way how to deal with unresponsive containers then kill whole instance?
Thanks