Open edsantiago opened 1 month ago
(If this is a netavark bug, could you please copy it there instead of moving? My flake log needs a podman issue number. Thanks.)
This sounds like the error we are seeing https://bugzilla.redhat.com/show_bug.cgi?id=2013173 but I haven't yet looked if this is something that netavark causes or if there is some other cause.
cc @mheon
So if I read https://wiki.nftables.org/wiki-nftables/index.php/Configuring_chains correctly the EBUSY error just means the chain is not empty when we try to remove it. In theory we delete all rules from the chain before we remove the chain but maybe there is some chance that we missed a rule somehow?
There is also the potential of a race against something else adding rules, though that something can't be Netavark because of locking.
Well it should be safe to assume that on a CI VM nothing besides netavark would mess with our nftables chains... So if locking works then we have some bug where rules are not deleted properly
Might be related to this: https://github.com/containers/netavark/issues/1068
$ cat /etc/containers/containers.conf
[containers]
userns = "auto"
[network]
firewall_driver = "nftables"
$ sudo podman system reset -f
...
$ sudo podman run -it --rm -p 10.0.1.10:10222:10222/udp -p 10.0.1.10:10222:10222/tcp alpine:latest sh
/ # // left it running
// In another Terminal
$ sudo podman network reload --all
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
ERRO[0000] netavark: nftables error: nft did not return successfully while applying ruleset
c652403742bc95392bfea0da8e7d37cff1d057c2b744fb22a393b39daf07498d
This can stabily reproduce the issue on my system (Fedora), doing sudo nft delete table inet netavark
before reloading can workaround such error.
For Unable to clean up network for container
, I believe I have seen such log when restart such containers with the firewall issue, don't know if related
This can stabily reproduce the issue on my system (Fedora), doing sudo nft delete table inet netavark before reloading can workaround such error.
I don't think it is related to this issue. This is about a specific CI flake and the error is EBUSY not ENOENT like in your case. If it only triggers with the specific port forwarding setup from https://github.com/containers/netavark/issues/1068 then this is most likely the cause for your problem. I take a look next week.
Very weird one-off:
Seen twice in one f40 root run.
What's weird about it:
It is possible that this has been happening all along, but ginkgo-retry has been hiding it. We have no sane way to find out, aside from downloading and grepping all logs for all CI runs. Or, as I will suggest in a future Cabal, disabling flake retries.