cri-o / cri-o

Open Container Initiative-based implementation of Kubernetes Container Runtime Interface
https://cri-o.io
Apache License 2.0
5.26k stars 1.07k forks source link

Calling stateCmd twice in the function UpdateContainerStatus is unnecessary. #7221

Open Eresear opened 1 year ago

Eresear commented 1 year ago

Hi All, After the first execution of stateCmd, the container will only continue to execute if it is in the stopped state. The exit code of the container is then read. As the container cannot transition from the stopped state to other states (such as creating, created, running), the second execution of stateCmd should not be necessary. I hope it makes sense to you :)


    ctx, span := log.StartSpan(ctx)
    defer span.End()
    c.opLock.Lock()
    defer c.opLock.Unlock()

    if c.Spoofed() {
        return nil
    }

    if c.state.ExitCode != nil && !c.state.Finished.IsZero() {
        log.Debugf(ctx, "Skipping status update for: %+v", c.state)
        return nil
    }

    stateCmd := func() (*ContainerState, bool, error) {
        out, err := r.runtimeCmd("state", c.ID())
        if err != nil {
            // there are many code paths that could lead to have a bad state in the
            // underlying runtime.
            // On any error like a container went away or we rebooted and containers
            // went away we do not error out stopping kubernetes to recover.
            // We always populate the fields below so kube can restart/reschedule
            // containers failing.
            if exitErr, isExitError := err.(*exec.ExitError); isExitError {
                log.Errorf(ctx, "Failed to update container state for %s: stdout: %s, stderr: %s", c.ID(), out, string(exitErr.Stderr))
            } else {
                log.Errorf(ctx, "Failed to update container state for %s: %v", c.ID(), err)
            }
            c.state.Status = ContainerStateStopped
            if err := updateContainerStatusFromExitFile(c); err != nil {
                c.state.Finished = time.Now()
                c.state.ExitCode = utils.Int32Ptr(255)
            }
            return nil, true, nil
        }
        state := *c.state
        if err := json.NewDecoder(strings.NewReader(out)).Decode(&state); err != nil {
            return &state, false, fmt.Errorf("failed to decode container status for %s: %s", c.ID(), err)
        }
        return &state, false, nil
    }
    state, canReturn, err := stateCmd()
    if err != nil {
        return err
    }
    if canReturn {
        return nil
    }

    if state.Status != ContainerStateStopped {
        *c.state = *state
        return nil
    }
    // release the lock before waiting
    c.opLock.Unlock()
    exitFilePath := c.exitFilePath()
    err = kwait.ExponentialBackoff(
        kwait.Backoff{
            Duration: 500 * time.Millisecond,
            Factor:   1.2,
            Steps:    6,
        },
        func() (bool, error) {
            _, err := os.Stat(exitFilePath)
            if err != nil {
                // wait longer
                return false, nil
            }
            return true, nil
        })
    c.opLock.Lock()
    // run command again
    state, _, err2 := stateCmd()
    if err2 != nil {
        return err2
    }
    if state == nil {
        return fmt.Errorf("state command returned nil")
    }
    *c.state = *state
    if err != nil {
        log.Warnf(ctx, "Failed to find container exit file for %v: %v", c.ID(), err)
    } else {
        if err := updateContainerStatusFromExitFile(c); err != nil {
            return err
        }
        log.Debugf(ctx, "Found exit code for %s: %d", c.ID(), *c.state.ExitCode)
    }

    oomFilePath := filepath.Join(c.bundlePath, "oom")
    if _, err = os.Stat(oomFilePath); err == nil {
        c.state.OOMKilled = true

        // Collect total metric
        metrics.Instance().MetricContainersOOMTotalInc()

        // Collect metric by container name
        metrics.Instance().MetricContainersOOMCountTotalInc(c.Name())
    }
    // If this container had a node level PID namespace, then any children processes will be leaked to init.
    // Eventually, the processes will get cleaned up when the pod cgroup is cleaned by the kubelet,
    // but this situation is atypical and should be avoided.
    if c.nodeLevelPIDNamespace() {
        return r.signalContainer(c, syscall.SIGKILL, true)
    }
    return nil
}```

https://github.com/cri-o/cri-o/blob/d7a647bf0d0d34731baa96e5194262879d2a17ed/internal/oci/runtime_oci.go#L1008C30-L1008C30
github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 10 months ago

Closing this issue since it had no activity in the past 90 days.

kwilczynski commented 6 months ago

/kind bug

github-actions[bot] commented 5 months ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 3 months ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 2 months ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 1 month ago

A friendly reminder that this issue had no activity for 30 days.