Closed ysksuzuki closed 2 years ago
Kubelet runs preStop hooks, stop containers and tears down pod network by executing CNI DEL. Container processes are already shut down at the point where contained executes CNI DEL. See ref.
Pod termination process
What
Currently, Coil waits for the 30s before destroying the pod network in the CNI delete operation. Why Coil is doing so is to keep connectivity for network components that need time to gracefully shut down active TCP connections. For example, Envoy waits for TCP connections to drain in its preStop hook before shutting down. The CNI delete is called as soon as the pod has the deletion timestamp, and destroying the pod network disrupts connections to Envoy and breaks the graceful shutdown assumption.
However, this implementation forces all pods including those that don't need such delay to wait in the CNI delete operation. For instance, the nat pods derived from coil Egress which only receive connection-less UDP packets have to wait for the 30s even though it's not necessary.
EDIT: We later found out that k8s calls the StopSandbox API of the container runtime after killing container processes. So coil doesn't need to sleep in its delete implementation. https://github.com/kubernetes/kubernetes/blob/02f9b2240814d2e952eaf7dca3a665a675004f21/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L979
related to https://github.com/cybozu-go/coil/pull/164
How
Options
Checklist