Removing hanged up containers

westfood commented 5 years ago

This is kind of issue.

I was debugging issue when cadvisor did not started properly. I found out it's because of non-responsive containers. What I experienced was service with 4/1 replicas (ie 3 more then it should be). Containers are running on docker nodes be it managers/workers. But they did not responded to docker inspect command. I removed the service, but containers are still running from perspective of docker engine. As they are hanged up - docker prune is also failing on these nodes. I guess I cannot stop these unresponsive containers via docker commands .

From what I understand - one way to is to kill these processes via PID of the host - but I am unable to get to the host as I am in SSH container. I can kill ec2 instance, but I would rather use it as last option.

So my question, is there other way how to deal with unresponsive containers then kill whole instance?

Thanks

Server Version: 17.12.0-ce

~ $ docker-diagnose
OK hostname=ip-172-31-27-122-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-18-225-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-45-242-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-0-27-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-8-217-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-27-205-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
OK hostname=ip-172-31-32-97-us-west-2-compute-internal session=1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
Done requesting diagnostics.
Your diagnostics session ID is 1560876007-bedylMBxBC5PJZoZL7a954OeKpA4tlNW
Please provide this session ID to the maintainer debugging your issue.

Meidan commented 5 years ago

@westfood you can get access to the host through a privileged container

westfood commented 5 years ago

Thanks! Just for reference, I was able to kill process without flag -privileged.

docker run -it --pid=host alpine sh

docker-archive / for-aws

Removing hanged up containers #194