containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
8.1k stars 601 forks source link

CNIEnv concurrency issues #3556

Open apostasie opened 2 weeks ago

apostasie commented 2 weeks ago

Description

Although #3491 and #3522 have fixed a lot of cases where CNI would fail because of concurrent access, there are still cases where this happens.

Here, on container create - but very likely everywhere else we manipulate CNIEnv.

We can continue playing wack-a-mole on this and fix every occurrence piece-meal, though it seems like rewriting CNIEnv in a safe way would be a better approach at this point.

The fundamental problems are:

Steps to reproduce the issue

FAIL: cmd/nerdctl/network TestNetworkCreate/with_MTU (0.17s)
    network_create_linux_test.go:108: ======================== Pre-test cleanup ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworkcreate-with-mtu-1b256b01
    network_create_linux_test.go:108: ======================== Test setup ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network create testnetworkcreate-with-mtu-1b256b01 --driver bridge --opt com.docker.network.driver.mtu=9216
    network_create_linux_test.go:108: ======================== Test Run ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test run --rm --net testnetworkcreate-with-mtu-1b256b01 ghcr.io/stargz-containers/alpine:3.13-org ifconfig eth0
    command.go:112: assertion failed: expect.ExitCode is not result.ExitCode: Expected exit code: 0

        Command:  /usr/local/bin/nerdctl --namespace=nerdctl-test run --rm --net testnetworkcreate-with-mtu-1b256b01 ghcr.io/stargz-containers/alpine:3.13-org ifconfig eth0
        ExitCode: 1
        Error:    exit status 1
        Stdout:   
        Stderr:   time="2024-10-16T18:45:12Z" level=fatal msg="failed to verify networking settings: failed to check for default network: error reading /etc/cni/net.d/nerdctl-test/nerdctl-testnetworklsfilter-1-d946011b.conflist: open /etc/cni/net.d/nerdctl-test/nerdctl-testnetworklsfilter-1-d946011b.conflist: no such file or directory"

        Env:
        HOSTNAME=dc5da5d26f5d
        MEMORY_PRESSURE_WRITE=c29tZSAyMDAwMDAgMjAwMDAwMAA=
        SYSTEMD_EXEC_PID=80
        container=docker
        HOME=/root
        LANG=C.UTF-8
        MEMORY_PRESSURE_WATCH=/sys/fs/cgroup/system.slice/docker-entrypoint.service/memory.pressure
        INVOCATION_ID=3d1d502413d2454da8a8a340e78b0311
        TERM=xterm
        USER=root
        SHLVL=3
        CGO_ENABLED=0
        _=/usr/local/bin/gotestsum
        PATH=/usr/local/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        ***
        DOCKER_CONFIG=/tmp/TestNetworkCreatewith_MTU2150649351/001
        NERDCTL_TOML=/tmp/TestNetworkCreatewith_MTU2150649351/001/nerdctl.toml
    case.go:164: ======================== Post-test cleanup ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworkcreate-with-mtu-1b256b01

Describe the results you received and expected

https://github.com/containerd/nerdctl/actions/runs/11371804119/job/31634685012?pr=3555#step:6:1674

What version of nerdctl are you using?

main

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

apostasie commented 2 weeks ago

Interesting variant:

https://github.com/containerd/nerdctl/actions/runs/11397513696/job/31713087686?pr=3535#step:6:496

level=fatal msg="failed to verify networking settings: failed to check for default network: error parsing configuration list: unexpected end of JSON input"

I would say this one ^ is a case of interrupted write - or competing write.