containernetworking / plugins

Some reference and example networking plugins, maintained by the CNI team.
Apache License 2.0
2.23k stars 788 forks source link

netlink walk calls may return EINTR (regression) #1121

Open saj opened 1 week ago

saj commented 1 week ago

Downstream users of the CNI plugins have been observing occasional interrupted system call failures. The simple loopback plugin is also affected.

Calls like netlink.AddrList may return EINTR as of vishvananda/netlink v1.2.1. https://github.com/vishvananda/netlink/commit/aa4f20db57d498a8db66aaa202ddae15c1fa81a5 Prior to this commit, the (possibly incomplete) results from a walk would be returned with a nil error.

I think this upstream commit was incorporated here in d924f05e12802522a0a3cc895b69c1eb812df316, which was shipped as v1.6.0.

https://www.kernel.org/doc/html/next/userspace-api/netlink/intro.html#dump-consistency

Dump consistency

Some of the data structures kernel uses for storing objects make it hard to provide an atomic snapshot of all the objects in a dump (without impacting the fast-paths updating them).

Kernel may set the NLM_F_DUMP_INTR flag on any message in a dump (including the NLMSG_DONE message) if the dump was interrupted and may be inconsistent (e.g. missing objects). User space should retry the dump if it sees the flag set.

More context can be found in https://github.com/vishvananda/netlink/pull/1018, which later added netlink.ErrDumpInterrupted. This commit is yet to be released, though it is available on vishvananda/netlink trunk.

AIUI:

ty

thompson-shaun commented 3 days ago

We also seem to be running into this issue in moby/buildkit ref: https://github.com/moby/buildkit/pull/5533