After running for a while, the containerd logs are continuously reporting an error:
level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-scheduler-master1,Uid:4bba31f5bbd08c1ecb43f3eeca03effb,Namespace:kube-system,Attempt:221,} failed, error" error="failed to create containerd task: failed to create init process: failed to insert taskinfo for init process(id=5585c9eb3702e459fb2c73b0314e2d77670df6af8b23b0662c4032e7e328af1a, namespace=k8s.io): update: key too big for map: argument list too long: unknown"
It appears that the error is occurring during the update of an eBPF map. The following Go code seems to be involved in the issue:
// traceInitProcess checks init process is alive and starts to trace it's exit
// event by exitsnoop bpf tracepoint.
func (m *monitor) traceInitProcess(init *initProcess) (retErr error) {
m.Lock()
defer m.Unlock()
fd, err := pidfd.Open(uint32(init.Pid()), 0)
if err != nil {
return fmt.Errorf("failed to open pidfd for %s: %w", init, err)
}
defer func() {
if retErr != nil {
unix.Close(int(fd))
}
}()
// NOTE: The pid might be reused before pidfd.Open(like oom-killer or
// manually kill), so that we need to check the runc-init's exec.fifo
// file descriptor which is the "identity" of runc-init. :)
//
// Why we don't use runc-state commandline?
//
// The runc-state command only checks /proc/$pid/status's starttime,
// which is not reliable. And then it only checks exec.fifo exist in
// disk, but the runc-init has been killed. So we can't just use it.
if err := checkRuncInitAlive(init); err != nil {
return err
}
nsInfo, err := getPidnsInfo(uint32(init.Pid()))
if err != nil {
return fmt.Errorf("failed to get pidns info: %w", err)
}
if err := m.initStore.Trace(uint32(init.Pid()), &exitsnoop.TaskInfo{
TraceID: init.traceEventID,
PidnsInfo: nsInfo,
}); err != nil {
return fmt.Errorf("failed to insert taskinfo for %s: %w", init, err)
}
defer func() {
if retErr != nil {
m.initStore.DeleteTracingTask(uint32(init.Pid()))
m.initStore.DeleteExitedEvent(init.traceEventID)
}
}()
// Before trace it, the init-process might be killed and the exitsnoop
// tracepoint will not work, we need to check it alive again by pidfd.
if err := fd.SendSignal(0, 0); err != nil {
return err
}
if err := m.pidPoller.Add(fd, func() error {
// TODO(fuweid): do we need to check the pid value in event?
status, err := m.initStore.GetExitedEvent(init.traceEventID)
if err != nil {
init.SetExited(unexpectedExitCode)
return fmt.Errorf("failed to get exited status: %w", err)
}
init.SetExited(int(status.ExitCode))
return nil
}); err != nil {
return err
}
return nil
}
It seems that the key is not being validated properly. The key 5585c9eb3702e459fb2c73b0314e2d77670df6af8b23b0662c4032e7e328af1a is just an example, and there are other keys that also fail, such as 1ea7f8369914d19bda8da29673e4f4e037c1b39e185f6f4da0dc167539754ca2, 578193dfea54c854054abdea0a7bea11ab99e35a8d89c6469ed28084d5ab5080.
After running for a while, the containerd logs are continuously reporting an error:
It appears that the error is occurring during the update of an eBPF map. The following Go code seems to be involved in the issue:
It seems that the key is not being validated properly. The key
5585c9eb3702e459fb2c73b0314e2d77670df6af8b23b0662c4032e7e328af1a
is just an example, and there are other keys that also fail, such as1ea7f8369914d19bda8da29673e4f4e037c1b39e185f6f4da0dc167539754ca2
,578193dfea54c854054abdea0a7bea11ab99e35a8d89c6469ed28084d5ab5080
.