containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
8.22k stars 612 forks source link

[rootless] bridge network degrading ending in failure to expose port #3488

Open apostasie opened 1 month ago

apostasie commented 1 month ago

Description

After some time testing:

nerdctl run -d --name bar -p 5003:80 nginx
6c1fa26eba42ff417d193acd3b7097ba33a3764a8a52efad49c4a1a99c8c4435

curl localhost:5003
curl: (56) Recv failure: Connection reset by peer

This is specific to that port - that has been used (past tense) by other containers (not specific to 5003 - just specific to whichever port has been used heavily).

It looks like after these containers got destroyed, something in cni (?) does not completely release the port (maybe in iptables?).

This is tricky to reproduce - I usually trigger this with repeatedly running test suite.

apostasie commented 1 month ago

Restarting containerd does fix the issue - until repeat usage will trigger it again.

@AkihiroSuda is containerd maintaining the list of mapped ports? Any pointer for me on this issue?

apostasie commented 1 month ago
sudo nsenter --net=/proc/410080/ns/net iptables-save

-A CNI-DN-49eb952835f33818786f4 -p tcp -m tcp --dport 5000 -j DNAT --to-destination 10.4.0.146:80
-A CNI-DN-7574ca47ce906c42f658c -s 10.4.0.0/24 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7574ca47ce906c42f658c -s 127.0.0.1/32 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7574ca47ce906c42f658c -p tcp -m tcp --dport 5000 -j DNAT --to-destination 10.4.0.124:5000
-A CNI-DN-d7f4e2470d8fbcbaa6504 -s 10.4.2.0/24 -p tcp -m tcp --dport 5004 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-d7f4e2470d8fbcbaa6504 -s 127.0.0.1/32 -p tcp -m tcp --dport 5004 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-d7f4e2470d8fbcbaa6504 -p tcp -m tcp --dport 5004 -j DNAT --to-destination 10.4.2.9:80
-A CNI-DN-e4d6192a32862ccfe7faa -s 10.4.0.0/24 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-e4d6192a32862ccfe7faa -s 127.0.0.1/32 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-e4d6192a32862ccfe7faa -p tcp -m tcp --dport 5000 -j DNAT --to-destination 10.4.0.123:5000

I am very much out of my comfort zone.

Is this just the rootless variant of #3253 ?

AkihiroSuda commented 1 month ago

is containerd maintaining the list of mapped ports?

No, the daemon doesn't care about ports

Is this just the rootless variant of #3253 ?

Maybe?

apostasie commented 1 month ago

@AkihiroSuda

I have been running qkboy patch for a few hours. It is making things better, but is not a full fix, and I still end-up with the same issue(s) in the end.

apostasie commented 1 month ago

Notes.

Clearly, the issue comes from iptables getting clobbered.

This:

while read -r line; do
        sudo nsenter --net=$line ./reset_iptables.sh
done < <(lsns -n -u -t net -o PATH)

With ./reset_iptables.sh

iptables --policy INPUT   ACCEPT;
iptables --policy OUTPUT  ACCEPT;
iptables --policy FORWARD ACCEPT;

iptables -Z
iptables -F
iptables -X
iptables -t nat -Z
iptables -t nat -F
iptables -t nat -X

Will immediately fix the issue when it happens.

At this point, I am not convinced that the PR opened on cni plugins will fully address this - I think containers get destroyed without iptables NAT entries getting cleaned - either inside the CNI plugins, or somehow inside nerdctl.

To be completely honest, oci-hooks + binary networking plugins does not feel like a good solution. oci-hooks are confusing to use, and do not properly allow for full lifecycle management (there is no "onStop", actually "onStop" == "onDelete"), and whatever happens in cni plugins between bridge and firewall is heinous to debug.

Anyhow, once done with the testing cleanup, rethinking/fixing networking should be top priority (along with login...).