Closed olyazavr closed 6 days ago
@olyazavr, what if you lower the terminationGracePeriodSeconds
property to 5 seconds?
When you say "jams up" kubelet, what does that mean? You cannot start new pods while the second container is running its sleep command?
You can run CRI-O with the debug log enabled and collect some logs from around the time when the pod is to be deleted. And also set kublelet log verbosity high enough (perhaps 8?) to collect debug logs, too.
@kwilczynski we're working on lowering terminationGracePeriodSeconds
, but this workload is somewhat out of our hands.
When I say jams up, I mean that kubelet remains unready, stuck at this line, logging:
kubelet.service[2043933]: E1108 18:54:46.297244 2043933 kubelet.go:2383] "Skipping pod synchronization" err="container runtime status check may not have completed yet"
kubelet.service[2043933]: I1108 18:54:46.320642 2043933 kuberuntime_manager.go:447] "Retrieved pods from runtime" all=true
over and over again. Pods cannot be interacted with (cannot be exec'd into, can't get logs, etc)
Yeah let me try those things. I ran crio with debug and nothing interesting came up (while stuck it logged about ListContainers/ListSandboxes requests/responses, but I can take a deeper look)
What's interesting is that crio 1.27.0 does not do this, but crio 1.29.4 does
it also could be helpful to gather the goroutines crio status goroutines
while it's happpening to see where cri-o is stuck. I'm guessing we're using a lock somewhere we formerly weren't
ok so I noticed that upon kubelet restart, when crio is about to hang:
Nov 08 21:04:13 ip-172-25-51-239 crio[3687019]: time="2024-11-08 21:04:13.413601917Z" level=debug msg="Unmounted container 157be0669fe06e94ced8d5a0f7d10885beb961ca7026b1f1bcb9a049753529d9" file="storage/runtime.go:495" id=e6e47387-ad69-45a8-be9e-a3cf2dfe7a15 name=/runtime.v1.RuntimeService/StopContainer
whereas, in 1.27 (no hanging):
Nov 08 20:58:57 ip-172-18-56-53 crio[3891941]: time="2024-11-08 20:58:57.924068275Z" level=debug msg="Request: &ContainerStatusRequest{...
Nov 08 20:58:57 ip-172-18-56-53 crio[3891941]: time="2024-11-08 20:58:57.924305233Z" level=debug msg="Request: &ContainerStatusRequest{..
Nov 08 20:58:57 ip-172-18-56-53 crio[3891941]: time="2024-11-08 20:58:57.924571462Z" level=debug msg="Response: &ContainerStatusResponse{...
Nov 08 20:58:58 ip-172-18-56-53 crio[3891941]: time="2024-11-08 20:58:58.617516447Z" level=warning msg="Stopping container 82edf0c5d6897a86c6cdfed89c29ce1b5f4ac150a43c30c831cd8966afa8aa66 with stop signal timed out: context canceled" file="oci/runtime_oci.go:913" id=8c56b682-87f6-4b1e-8479-fe5306888174 name=/runtime.v1.RuntimeService/StopContainer
goroutines here: https://gist.github.com/olyazavr/789c401f9bb8d1b29e11e8c72522a61f
are you able to try out a newer 1.29? there have been some changes to the stop code and I'm wondering if they've helped
this is the goroutine dump before kubelet restart, but when the workload that typically gets stuck is Terminating (not the same instance of workload as the above goroutine unfortunately) https://gist.github.com/olyazavr/2f682621112d61314f96c0df941a3273
@haircommander yes, let me try that out
No dice, I tried 1.29.10 and it still has the same problem. I should be able to try out 1.30/1.31 in the coming week
1.29.10 dump: https://gist.github.com/olyazavr/9919c73ff840add43159238378c78da4
it has that same line after restarting kubelet:
Nov 09 16:06:32 ip-172-18-48-187 crio[15878]: time="2024-11-09 16:06:32.192621024Z" level=debug msg="Unmounted container 58c02065df5f5fe8ccb803a1bcbafa9cc669f38b929fe6a61c46466999ba3c9f" file="storage/runtime.go:495" id=eee1246f-b7b7-485b-83d4-803d5d5c0b58 name=/runtime.v1.RuntimeService/StopContainer
@olyazavr, can you try a more recent version of CRI-O?
I am unable to reproduce this issue with CRI-O built using the master
branch. But also, when using releases from 1.29 to 1.31, things work fine for me. There is no "hang" of kubelet, and it simply continues to allow me to schedule new pods, etc.
Slightly different reproducer:
---
apiVersion: v1
kind: Pod
metadata:
name: test-stop
namespace: default
spec:
containers:
- name: quick-stop
image: docker.io/library/ubuntu:22.04
command:
- /bin/sh
- -c
- |
echo "Starting!"
_term () {
echo "Caught SIGTERM!"
exit 0
}
trap _term TERM
while true ; do date ; sleep 1 ; done
- name: slow-stop
image: docker.io/library/ubuntu:22.04
command:
- /bin/sh
- -c
- |
echo "Starting!"
_term () {
echo "Caught SIGTERM!"
}
trap _term TERM
while true ; do date ; sleep 1 ; done
restartPolicy: Always
terminationGracePeriodSeconds: 3600
I also have no issues using the template you provided as a reproducer.
The kubelet versions I tested were also from 1.29 to 1.31 from the official package repositories.
What runtime are you using? crun? runc? Also, what versions of the container runtime?
Would it be possible for you to collect pprof from CRI-O and kubelet when the issue is occurring for you? Aside from collecting goroutine dumps.
There is something you could do, too, if you have the time: bisect the code base to see which commit is potentially causing the issues, especially since you mentioned that CRI-O 1.27 does not have this behaviour. So, a bisect from 1.27 to 1.28, perhaps? Might be quite a job, given the number of commits, etc.
Having said all that, is there anything custom or special about your CRI-O configuration or your runtime environment?
So when you restart kubelet, does it kill the terminating pod or does it let it finish terminating? (For me, in 1.27, it would just kill the terminating container/pod without letting it finish out its grace period, which is fine, but just verifying that I'm seeing the same behavior here)
@olyazavr, no issues here. I can restart kubelet process without any issues. Nothing would "hang" or otherwise. Also, nothing kills pods and containers when this happens.
Tested with:
INFO[2024-11-13 13:21:15.598975778Z] Starting CRI-O, version: 1.27.8, git: 7597c4329bb447b23d2d150a903e6148fb99797e(clean)
We have a wrapper around crun that does a few things like kill any exec commands on container termination (for this bug https://github.com/cri-o/cri-o/issues/6699). It doesn't seem like it does very much but I'll dig deeper here
Try running things vanilla without any wrappers, especially when troubleshooting issues such as this one, in order to make sure things are run without anything that might change behaviour of some of the components (there is conmon and crun here, too). The issue you linked to might have been resolved with some updates that a more modern version of CRI-O enjoys, hopefully. :smile:
I just tried with 1.30 and it all works, no hanging! I also tried 1.29 without any custom logic but it did hang, not sure what else is in play there. Also something interesting is that in 1.30, I see the "right" behavior on kubelet restart- the terminating pod is allowed to continue to terminate, whereas in 1.27 I was seeing the terminating pod just get immediately killed after a kubelet restart
One thing that may be important is that we run with cadvisor (kubelet was hanging at StartCadvisor).
I'm a little nervous that there is still something wacky going on that we didn't find (either in crio or in our setup) and that this will just resurface again in a later version. I'll try more debugging on our end
@olyazavr, there aren't that many commits, in terms of volume, between 1.30 and 1.29, hopefully, so feel free to bisect between the two versions.
I don't see the connection between kubelet and CRI-O (and there is conmon and the runtime here working behind the scenes) when it goes to kubelet restart. The RPC is primarily one-sided from kubelet to CRI-O, since a CRI does kubelet's bidding, so to speak. And there is also nothing that kills containers in kubelet—that's CRI's job.
Unless I am missing something here, too?
Interesting development, I noticed that cadvisor had changed versions between 1.27/1.29/1.30 and sure enough, upgrading to the version 1.30 has (I patched our 1.29 kubelet) made the bug disappear
1.27 had cadvisor v0.47.2 1.29 had cadvisor v0.48.1 1.30 had cadvisor v0.49.0
I'll close this out, thanks for all your help!
@olyazavr, so you patched kubelet 1.27 you ran to use the newer cAdvisor dependency where you took the version from Kubernetes 1.29, which then worked, correct?
@kwilczynski I patched 1.29 kubelet to use the cadvisor version that 1.30 had (for me, 1.27 worked, 1.29 had the bug, and 1.30 worked again). My patched 1.29 kubelet worked without this bug
@olyazavr, got it. Good to know this fixed the issues you were seeing for you!
The number of commits (changes) between releases v0.48.1 and v0.49.0 is not very large per:
I wonder what the change was that fixed things for you, especially given that there aren't any significant changes of note between these two releases of cAdvisor. Very curious, indeed.
Worth noting is that cAdvisor is a rather slow-moving project nowadays, with releases far and few between.
I wish you had some time to do a bisect, but if not, then I will take the code backport as a win here, even though what actually was fixed is a bit of a X-Files level enigma.
This might be related to an older issue:
I found that this PR: https://github.com/google/cadvisor/pull/3457 fixed my problem!
With cadvisor a https://github.com/google/cadvisor/commit/42bb3d13a0cf9ab80c880a16c4ebb4f36e51b0c9 - it works, and with the previous commit https://github.com/google/cadvisor/commit/27f1e92b0bcc9d97c10fdc5e6b808e2847e907fb it does not
@olyazavr, thank you! That's a good signal, indeed.
We could update our dependency on cAdvisor even for older releases. I don't see reasons why not.
@haircommander, anything against doing that?
What happened?
When a container takes a long time to terminate, and kubelet is restarted, kubelet hangs until the container terminates. This makes the whole node unready. I tracked this down to when cadvisor is started, and then to somewhere in newCrioContainerHandler, probably when it gets info about the container
In our setup, we have a certain workload that consists of a pod with two containers, and one of those eats sigterm and just waits for a long time to shut down. That container is the one that causes this whole mess.
Kubelet will just keep logging:
As for logs, here's the last log that makes it in before the long wait for the container to shut down:
Nov 07 19:53:23 ip-172-16-111-198 kubelet.service[1162146]: I1107 19:53:23.354625 1162146 factory.go:275] Using factory "crio" for container "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d269700_ddd7_4c51_ba9e_40c014193013.slice/crio-0b74fdbd9af1713bb6abf1bfce22715675e13dd70b3e9c0fa2c312b63400186d"
and after the container shuts down, we see:Nov 07 19:57:16 ip-172-16-111-198 kubelet.service[1162146]: I1107 19:57:16.559026 1162146 manager.go:981] Added container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d269700_ddd7_4c51_ba9e_40c014193013.slice/crio-0b74fdbd9af1713bb6abf1bfce22715675e13dd70b3e9c0fa2c312b63400186d" (aliases: [k8s_POD_session-agent-6fdcdc7968-rvp7q_browsers-lighthouse_0d269700-ddd7-4c51-ba9e-40c014193013_0 0b74fdbd9af1713bb6abf1bfce22715675e13dd70b3e9c0fa2c312b63400186d], namespace: "crio")
which supports the theory that it's cadvisor hanging on getting the container info from crio.
Interestingly, crictl inspect works just fine
I got a goroutine dump from crio while it's hanging after the kubelet restart: here
What did you expect to happen?
Crio should not hang and kubelet should be able to start up
How can we reproduce it (as minimally and precisely as possible)?
Create a pod with two containers, one of which eats sigterm:
Delete the pod, and then restart kubelet on the node
Anything else we need to know?
No response
CRI-O and Kubernetes version
OS version
Additional environment details (AWS, VirtualBox, physical, etc.)