containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
8.24k stars 614 forks source link

`networks` related code is racy #3086

Closed apostasie closed 1 month ago

apostasie commented 5 months ago

Description

For some reason, network tests are very racy on my rig.

Just taking network_remove_linux_test, I get several different conditions quite fast:

From a cursory reading, it feels to me like number 1 is somewhere in netutils making assumptions about the availability of objects.

Number 2 is probably also in netutils - seems to me like racyness between checking that a network exist and a later operation that depends on reading the config. Could be that for certain operation we do not use filelock (properly).

Number 3 is more worrisome.

Steps to reproduce the issue

go test

Describe the results you received and expected

Fail 1 out of 10 times with a variety of different reasons.

What version of nerdctl are you using?

1.7.6

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

apostasie commented 2 months ago

Following on the refactor and ongoing test tooling rewrite, it is easier to repro:

while true; do go test  ./cmd/nerdctl/network/ -count 1; done

Will trigger: time="2024-09-22T11:49:18-07:00" level=fatal msg="error reading /home/dmp.linux/.config/cni/net.d/nerdctl-test/nerdctl-testnetworkinspect_test_network_inspect.conflist: open /home/dmp.linux/.config/cni/net.d/nerdctl-test/nerdctl-testnetworkinspect_test_network_inspect.conflist: no such file or directory" after a few tries.

apostasie commented 2 months ago
--- FAIL: TestNetworkLsFilter (0.18s)
    helpers.go:45: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworklsfilter-1-d946011b
    helpers.go:45: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworklsfilter-2-31369b70
    helpers.go:58: /usr/local/bin/nerdctl --namespace=nerdctl-test network create --label=mylabel=label-1 testnetworklsfilter-1-d946011b
    helpers.go:58: /usr/local/bin/nerdctl --namespace=nerdctl-test network create testnetworklsfilter-2-31369b70
    --- FAIL: TestNetworkLsFilter/filter_label (0.02s)
        network_list_linux_test.go:93: /usr/local/bin/nerdctl --namespace=nerdctl-test network ls --quiet --filter label=mylabel=label-1
        network_list_linux_test.go:93: assertion failed: expect.ExitCode is not result.ExitCode: Expected exit code: 0

            Command:  /usr/local/bin/nerdctl --namespace=nerdctl-test network ls --quiet --filter label=mylabel=label-1
            ExitCode: 1
            Error:    exit status 1
            Stdout:
            Stderr:   time="2024-09-29T17:11:34-07:00" level=fatal msg="error reading /home/dmp.linux/.config/cni/net.d/nerdctl-test/nerdctl-testnetworkcreate-network-create-1-825b6cfc.conflist: open /home/dmp.linux/.config/cni/net.d/nerdctl-test/nerdctl-testnetworkcreate-network-create-1-825b6cfc.conflist: no such file or directory"

            Env:
            SHELL=/bin/bash
            COLORTERM=truecolor
            LOGNAME=dmp
            XDG_SESSION_TYPE=tty
            HOME=/home/dmp.linux
            LANG=C.UTF-8
            SSH_CONNECTION=192.168.5.2 51075 192.168.5.15 22
            LESSCLOSE=/usr/bin/lesspipe %s %s
            XDG_SESSION_CLASS=user
            TERM=xterm-256color
            LESSOPEN=| /usr/bin/lesspipe %s
            USER=dmp
            SHLVL=2
            XDG_SESSION_ID=2
            XDG_RUNTIME_DIR=/run/user/501
            SSH_CLIENT=192.168.5.2 51075 22
            XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
            DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/501/bus
            SSH_TTY=/dev/pts/1
            OLDPWD=/Users/dmp
            _=/home/dmp.linux/downloads/go/bin/go
            PATH=/home/dmp.linux/downloads/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/sbin:/sbin:/home/dmp.linux/downloads/go/bin:/usr/sbin:/sbin
            PWD=/Users/dmp/Projects/go/nerd/nerdctl/cmd/nerdctl/network
            NERDCTL_TOML=/tmp/TestNetworkLsFilterfilter_label939623861/001/nerdctl.toml
            DOCKER_CONFIG=/tmp/TestNetworkLsFilterfilter_label939623861/001
    helpers.go:45: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworklsfilter-1-d946011b
    helpers.go:45: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworklsfilter-2-31369b70
apostasie commented 2 months ago

@AkihiroSuda would you be open to adding this to v2.0? There is clearly something wrong / racy in our network code even with simple things.

apostasie commented 1 month ago

@AkihiroSuda I am closing this as resolved.

3491 impact is actually really positive and I cannot trigger the no such file on conflist anymore.

Although our network code is still problematic, we should now open more targeted issues.