Open thaJeztah opened 1 week ago
Hmm... looks like there's failures in CI related to OTEL; looks like multiple metrics servers are tried to be set up on port 9110;
time="2024-06-26T10:15:05.945761805Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
...
time="2024-06-26T10:15:05.946779174Z" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
...
umount globally shared mountpoint
umount: /var/lib/containerd-nydus/mnt: no mount point specified.
ls: cannot access '/run/containerd-nydus/containerd-nydus-grpc.sock': No such file or directory
Fail(1). Retrying...
time="2024-06-26T10:15:08.579292914Z" level=info msg="Start nydus-snapshotter. Version: fba89c3, PID: 143, FsDriver: fusedev, DaemonMode: multiple"
time="2024-06-26T10:15:08.581727603Z" level=info msg="Trying to translate bucket records..."
time="2024-06-26T10:15:08.581798015Z" level=info msg="Trying to update bucket records from v1.0 to v1.1 ..."
time="2024-06-26T10:15:08.582908629Z" level=info msg="parsed cgroup config: cgroup.Config{MemoryLimitInBytes:-1}"
time="2024-06-26T10:15:08.582980705Z" level=info msg="cgroup mode: legacy"
time="2024-06-26T10:15:08.583967256Z" level=info msg="create cgroup (v1) successful, state: thawed"
time="2024-06-26T10:15:08.584895731Z" level=info msg="Run daemons monitor..."
time="2024-06-26T10:15:08.585325775Z" level=fatal msg="failed to start nydus-snapshotter" error="failed to initialize snapshotter: start metrics HTTP server: metrics server listener, addr=:9110: listen tcp :9110: bind: address already in use"
ls: cannot access '/run/containerd-nydus/containerd-nydus-grpc.sock': No such file or directory
Looks related to this code, but I'm wondering now if the snapshotted itself should setup a metrics server, or if that's something that only containerd should do https://github.com/containerd/nydus-snapshotter/blob/71ab64ad42d30c9b1538418ba041df0d1191d4f1/snapshot/snapshot.go#L180-L200
Code above was added in https://github.com/containerd/nydus-snapshotter/commit/fe511b89ba033ab189bc2e4b08c7bcfb50bd132b, which is part of this PR:
cc @cpuguy83 @dmcgowan perhaps one of you knows?
Code above was added in fe511b8, which is part of this PR:
I think the CI failure is not caused by this patch. It seems the :9110
is already in use.
Can we retry this CI?
Can we retry this CI?
Otherwise, we can disable metrics server in E2E test.
The reason why the :9110
port is already in use is the old "containerd-nydus-grpc
process is not killed successfully.
https://github.com/containerd/nydus-snapshotter/blob/main/integration/entrypoint.sh#L140
Ah! I guess I misinterpreted the cfg.MetricsConfig.Address
to be at the "global" level (so for containerd as a whole), but it's for this snapshotter.
The reason why the :9110 port is already in use is the old "containerd-nydus-grpc process is not killed successfully.
Good one; maybe? I don't have permissions to restart Ci on this repo; I can do a quick close & reopen to try, or perhaps you're able to kick only the failing ones.
This failure didn't happen on my other PR though, and I've seen it fail twice (both before and after rebasing), so .. it's still possible there's a regression somewhere (or change in behavior)
Let me just do a close/reopen to get a fresh run
go.mod: github.com/containerd/containerd v1.7.18
full diff: https://github.com/containerd/containerd/compare/v1.7.7...v1.7.18
switch to github.com/containerd/errdefs module
containerd 1.7.18 and up alias the errdefs package to the new module, and deprecate the package.