containerd / ttrpc

GRPC for low-memory environments
Apache License 2.0
558 stars 80 forks source link

client: add synchronize between userCloseFunc and rpc call #88

Closed wlmxjm1 closed 3 years ago

wlmxjm1 commented 3 years ago

It is found that sometimes(especially when userCloseFunc takes much time) restart container will fail because the older container's bundle path still exit.

The root cause is analyzed as below:

when containerd runtime plugin exites abnormally, ttrpc connection will closed
and userCloseFunc will be called to handle cleanup the resources created by
containerd shim. current rpc call will also return err. But these two step are
asynchronous.

after rpc call return err, upper application such as k8s may restart container.
but start may fail due to cleanup not finish, some resources not be released.
and this leaked resources leads to failed inplace-update the pod again.

One of the fixed way is make sure the synchronization between userCloseFunc and rpc call in ttrpc.

https://github.com/containerd/ttrpc/pull/87