Closed aweeks closed 2 years ago
@aweeks have you observed this on any newer kernel version, or more than once on 5.4.110-54.189.amzn2.x86_64
?
Without steps to reproduce, there's probably not much we can do here. There have been many revisions to the kernel and several to the kubelet since this occurred; so please update to the latest AMI and create a new issue (referencing this one) if you observe this again.
What happened:
Kernel panic while running
5.4.110-54.189.amzn2.x86_64
:Per the kernel oops above, it appears that a syscall from
kubelet
led to the panic, but beyond that I don't have much more insight.After digging into the stack a little bit more, one possibility is that
dname
(returned byblkg_dev_name()
) was a bad pointer: link. When later deferenced as part of the%s
formatting inseq_printf
, it could have generated the memory protection fault.Interestingly, it would not have been a null pointer, as that is explicitly checked here.
I looked through the Kernel bugtracker, and was not able to find any bugs that seemed related.
What you expected to happen:
No kernel panic.
How to reproduce it (as minimally and precisely as possible):
I unfortunately do not have a repro—this has only occurred once in our clusters.
Anything else we need to know?:
Environment:
us-west-2
r5.12xlarge
aws eks describe-cluster --name <name> --query cluster.platformVersion
):eks.6
aws eks describe-cluster --name <name> --query cluster.version
):1.19
ami-01a9605bd94e3d03d
uname -a
):Linux ip-100-122-23-239.us-west-2.compute.internal 5.4.110-54.189.amzn2.x86_64 #1 SMP Mon Apr 26 21:25:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/eks/release
on a node):