Open apostasie opened 1 month ago
Restarting containerd does fix the issue - until repeat usage will trigger it again.
@AkihiroSuda is containerd maintaining the list of mapped ports? Any pointer for me on this issue?
sudo nsenter --net=/proc/410080/ns/net iptables-save
-A CNI-DN-49eb952835f33818786f4 -p tcp -m tcp --dport 5000 -j DNAT --to-destination 10.4.0.146:80
-A CNI-DN-7574ca47ce906c42f658c -s 10.4.0.0/24 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7574ca47ce906c42f658c -s 127.0.0.1/32 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7574ca47ce906c42f658c -p tcp -m tcp --dport 5000 -j DNAT --to-destination 10.4.0.124:5000
-A CNI-DN-d7f4e2470d8fbcbaa6504 -s 10.4.2.0/24 -p tcp -m tcp --dport 5004 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-d7f4e2470d8fbcbaa6504 -s 127.0.0.1/32 -p tcp -m tcp --dport 5004 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-d7f4e2470d8fbcbaa6504 -p tcp -m tcp --dport 5004 -j DNAT --to-destination 10.4.2.9:80
-A CNI-DN-e4d6192a32862ccfe7faa -s 10.4.0.0/24 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-e4d6192a32862ccfe7faa -s 127.0.0.1/32 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-e4d6192a32862ccfe7faa -p tcp -m tcp --dport 5000 -j DNAT --to-destination 10.4.0.123:5000
I am very much out of my comfort zone.
Is this just the rootless variant of #3253 ?
is containerd maintaining the list of mapped ports?
No, the daemon doesn't care about ports
Is this just the rootless variant of #3253 ?
Maybe?
@AkihiroSuda
I have been running qkboy patch for a few hours. It is making things better, but is not a full fix, and I still end-up with the same issue(s) in the end.
Notes.
Clearly, the issue comes from iptables getting clobbered.
This:
while read -r line; do
sudo nsenter --net=$line ./reset_iptables.sh
done < <(lsns -n -u -t net -o PATH)
With ./reset_iptables.sh
iptables --policy INPUT ACCEPT;
iptables --policy OUTPUT ACCEPT;
iptables --policy FORWARD ACCEPT;
iptables -Z
iptables -F
iptables -X
iptables -t nat -Z
iptables -t nat -F
iptables -t nat -X
Will immediately fix the issue when it happens.
At this point, I am not convinced that the PR opened on cni plugins will fully address this - I think containers get destroyed without iptables NAT entries getting cleaned - either inside the CNI plugins, or somehow inside nerdctl.
To be completely honest, oci-hooks + binary networking plugins does not feel like a good solution. oci-hooks are confusing to use, and do not properly allow for full lifecycle management (there is no "onStop", actually "onStop" == "onDelete"), and whatever happens in cni plugins between bridge and firewall is heinous to debug.
Anyhow, once done with the testing cleanup, rethinking/fixing networking should be top priority (along with login...).
Description
After some time testing:
This is specific to that port - that has been used (past tense) by other containers (not specific to
5003
- just specific to whichever port has been used heavily).It looks like after these containers got destroyed, something in cni (?) does not completely release the port (maybe in iptables?).
This is tricky to reproduce - I usually trigger this with repeatedly running test suite.