Closed apostasie closed 1 month ago
Following on the refactor and ongoing test tooling rewrite, it is easier to repro:
while true; do go test ./cmd/nerdctl/network/ -count 1; done
Will trigger: time="2024-09-22T11:49:18-07:00" level=fatal msg="error reading /home/dmp.linux/.config/cni/net.d/nerdctl-test/nerdctl-testnetworkinspect_test_network_inspect.conflist: open /home/dmp.linux/.config/cni/net.d/nerdctl-test/nerdctl-testnetworkinspect_test_network_inspect.conflist: no such file or directory"
after a few tries.
--- FAIL: TestNetworkLsFilter (0.18s)
helpers.go:45: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworklsfilter-1-d946011b
helpers.go:45: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworklsfilter-2-31369b70
helpers.go:58: /usr/local/bin/nerdctl --namespace=nerdctl-test network create --label=mylabel=label-1 testnetworklsfilter-1-d946011b
helpers.go:58: /usr/local/bin/nerdctl --namespace=nerdctl-test network create testnetworklsfilter-2-31369b70
--- FAIL: TestNetworkLsFilter/filter_label (0.02s)
network_list_linux_test.go:93: /usr/local/bin/nerdctl --namespace=nerdctl-test network ls --quiet --filter label=mylabel=label-1
network_list_linux_test.go:93: assertion failed: expect.ExitCode is not result.ExitCode: Expected exit code: 0
Command: /usr/local/bin/nerdctl --namespace=nerdctl-test network ls --quiet --filter label=mylabel=label-1
ExitCode: 1
Error: exit status 1
Stdout:
Stderr: time="2024-09-29T17:11:34-07:00" level=fatal msg="error reading /home/dmp.linux/.config/cni/net.d/nerdctl-test/nerdctl-testnetworkcreate-network-create-1-825b6cfc.conflist: open /home/dmp.linux/.config/cni/net.d/nerdctl-test/nerdctl-testnetworkcreate-network-create-1-825b6cfc.conflist: no such file or directory"
Env:
SHELL=/bin/bash
COLORTERM=truecolor
LOGNAME=dmp
XDG_SESSION_TYPE=tty
HOME=/home/dmp.linux
LANG=C.UTF-8
SSH_CONNECTION=192.168.5.2 51075 192.168.5.15 22
LESSCLOSE=/usr/bin/lesspipe %s %s
XDG_SESSION_CLASS=user
TERM=xterm-256color
LESSOPEN=| /usr/bin/lesspipe %s
USER=dmp
SHLVL=2
XDG_SESSION_ID=2
XDG_RUNTIME_DIR=/run/user/501
SSH_CLIENT=192.168.5.2 51075 22
XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/501/bus
SSH_TTY=/dev/pts/1
OLDPWD=/Users/dmp
_=/home/dmp.linux/downloads/go/bin/go
PATH=/home/dmp.linux/downloads/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/sbin:/sbin:/home/dmp.linux/downloads/go/bin:/usr/sbin:/sbin
PWD=/Users/dmp/Projects/go/nerd/nerdctl/cmd/nerdctl/network
NERDCTL_TOML=/tmp/TestNetworkLsFilterfilter_label939623861/001/nerdctl.toml
DOCKER_CONFIG=/tmp/TestNetworkLsFilterfilter_label939623861/001
helpers.go:45: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworklsfilter-1-d946011b
helpers.go:45: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworklsfilter-2-31369b70
@AkihiroSuda would you be open to adding this to v2.0? There is clearly something wrong / racy in our network code even with simple things.
@AkihiroSuda I am closing this as resolved.
no such file
on conflist anymore.Although our network code is still problematic, we should now open more targeted issues.
Description
For some reason, network tests are very racy on my rig.
Just taking
network_remove_linux_test
, I get several different conditions quite fast:task xyz not found: not found
reading /etc/cni/net.d/nerdctl-nerdctl-testnetworkremovebyid.conflist: open /etc/cni/net.d/nerdctl-nerdctl-testnetworkremovebyid.conflist: no such file or directory
failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: time=\"2024-06-13T03:36:47Z\" level=fatal msg=\"no such network: \\\"nerdctl-testnetworkprune\\\"\"\nFailed to write to log, write /var/lib/nerdctl/1935db59/containers/nerdctl-testnetworkprune/9c8747e53f2fa1cba99145d19f48648a85d63ccf5ae7167038a3536336c2d6aa/oci-hook.startContainer.log: file already closed: unknown
From a cursory reading, it feels to me like number 1 is somewhere in
netutils
making assumptions about the availability of objects.Number 2 is probably also in netutils - seems to me like racyness between checking that a network exist and a later operation that depends on reading the config. Could be that for certain operation we do not use filelock (properly).
Number 3 is more worrisome.
Steps to reproduce the issue
go test
Describe the results you received and expected
Fail 1 out of 10 times with a variety of different reasons.
What version of nerdctl are you using?
1.7.6
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
None
Host information
No response