fuweid / embedshim

Provide task runtime implementation with pidfd and eBPF sched_process_exit tracepoint to manage deamonless container with low overhead.
Apache License 2.0
117 stars 11 forks source link

update: key too big for map: argument list too long: unknown #33

Open 113xiaoji opened 9 months ago

113xiaoji commented 9 months ago

After running for a while, the containerd logs are continuously reporting an error:

level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-scheduler-master1,Uid:4bba31f5bbd08c1ecb43f3eeca03effb,Namespace:kube-system,Attempt:221,} failed, error" error="failed to create containerd task: failed to create init process: failed to insert taskinfo for init process(id=5585c9eb3702e459fb2c73b0314e2d77670df6af8b23b0662c4032e7e328af1a, namespace=k8s.io): update: key too big for map: argument list too long: unknown"

It appears that the error is occurring during the update of an eBPF map. The following Go code seems to be involved in the issue:

// traceInitProcess checks init process is alive and starts to trace it's exit
// event by exitsnoop bpf tracepoint.
func (m *monitor) traceInitProcess(init *initProcess) (retErr error) {
    m.Lock()
    defer m.Unlock()

    fd, err := pidfd.Open(uint32(init.Pid()), 0)
    if err != nil {
        return fmt.Errorf("failed to open pidfd for %s: %w", init, err)
    }
    defer func() {
        if retErr != nil {
            unix.Close(int(fd))
        }
    }()

    // NOTE: The pid might be reused before pidfd.Open(like oom-killer or
    // manually kill), so that we need to check the runc-init's exec.fifo
    // file descriptor which is the "identity" of runc-init. :)
    //
    // Why we don't use runc-state commandline?
    //
    // The runc-state command only checks /proc/$pid/status's starttime,
    // which is not reliable. And then it only checks exec.fifo exist in
    // disk, but the runc-init has been killed. So we can't just use it.
    if err := checkRuncInitAlive(init); err != nil {
        return err
    }

    nsInfo, err := getPidnsInfo(uint32(init.Pid()))
    if err != nil {
        return fmt.Errorf("failed to get pidns info: %w", err)
    }

    if err := m.initStore.Trace(uint32(init.Pid()), &exitsnoop.TaskInfo{
        TraceID:   init.traceEventID,
        PidnsInfo: nsInfo,
    }); err != nil {
        return fmt.Errorf("failed to insert taskinfo for %s: %w", init, err)
    }
    defer func() {
        if retErr != nil {
            m.initStore.DeleteTracingTask(uint32(init.Pid()))
            m.initStore.DeleteExitedEvent(init.traceEventID)
        }
    }()

    // Before trace it, the init-process might be killed and the exitsnoop
    // tracepoint will not work, we need to check it alive again by pidfd.
    if err := fd.SendSignal(0, 0); err != nil {
        return err
    }

    if err := m.pidPoller.Add(fd, func() error {
        // TODO(fuweid): do we need to check the pid value in event?
        status, err := m.initStore.GetExitedEvent(init.traceEventID)
        if err != nil {
            init.SetExited(unexpectedExitCode)
            return fmt.Errorf("failed to get exited status: %w", err)
        }

        init.SetExited(int(status.ExitCode))
        return nil
    }); err != nil {
        return err
    }
    return nil
}

It seems that the key is not being validated properly. The key 5585c9eb3702e459fb2c73b0314e2d77670df6af8b23b0662c4032e7e328af1a is just an example, and there are other keys that also fail, such as 1ea7f8369914d19bda8da29673e4f4e037c1b39e185f6f4da0dc167539754ca2, 578193dfea54c854054abdea0a7bea11ab99e35a8d89c6469ed28084d5ab5080.