containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.9k stars 2.42k forks source link

pasta: UDP port range forwarding: hang in podman logs #24272

Open edsantiago opened 1 month ago

edsantiago commented 1 month ago

Very infrequent. I can't tell if it's parallel-related or not:


not ok 344 |505| UDP port range forwarding, IPv4, tap in 122187ms
         # tags: ci:parallel
         # (from function `run_podman' in file test/system/[helpers.bash, line 561](https://github.com/containers/podman/blob/94dcf76eb2d65c5f5a40f83d855cd17ef6655cc3/test/system/helpers.bash#L561),
         #  from function `pasta_test_do' in file test/system/[505-networking-pasta.bats, line 252](https://github.com/containers/podman/blob/94dcf76eb2d65c5f5a40f83d855cd17ef6655cc3/test/system/505-networking-pasta.bats#L252),
         #  in test file test/system/[505-networking-pasta.bats, line 609](https://github.com/containers/podman/blob/94dcf76eb2d65c5f5a40f83d855cd17ef6655cc3/test/system/505-networking-pasta.bats#L609))
         #   `pasta_test_do' failed
         #
<+     > # $ podman info --format {{.Host.Pasta.Executable}}
<+246ms> # /usr/bin/pasta
         #
<+357ms> # $ podman run -d --name=c-socat-t344-mdzgdmfm --net=pasta -p [10.128.15.220]:5157-5159:5157-5159/udp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5157 5159); do                              socat -u UDP4-LISTEN:${port},null-eof STDOUT &                          done; wait
<+378ms> # f69bae53762f567446cb332ae6b6c1e91cea4ae2cc74c4f10583d84a80c83ce8
         #
<+036ms> # $ podman exec c-socat-t344-mdzgdmfm ss -Hln -4 --udp sport = 5157
<+159ms> # UNCONN 0      0      0.0.0.0:5157 0.0.0.0:*
         #
<+035ms> # $ podman exec c-socat-t344-mdzgdmfm ss -Hln -4 --udp sport = 5158
<+171ms> # UNCONN 0      0      0.0.0.0:5158 0.0.0.0:*
         #
<+030ms> # $ podman exec c-socat-t344-mdzgdmfm ss -Hln -4 --udp sport = 5159
<+233ms> # UNCONN 0      0      0.0.0.0:5159 0.0.0.0:*
         #
<+062ms> # $ podman logs --follow c-socat-t344-mdzgdmfm
<+0120s> # timeout: sending signal TERM to command ‘/var/tmp/go/src/github.com/containers/podman/bin/podman’

The podman logs --follow blocks until the container terminates. It looks like the container is not terminating.

x x x x x x
sys(2) podman(2) debian-13(1) rootless(2) host(2) sqlite(1)
fedora-40(1) boltdb(1)
sbrivio-rh commented 1 month ago

Hopefully the same as #24147, but I can't reproduce that with these tests.

github-actions[bot] commented 6 days ago

A friendly reminder that this issue had no activity for 30 days.

edsantiago commented 2 days ago

Seen again yesterday in f40

sbrivio-rh commented 2 days ago

The only hypothesis I currently have for this one is that we're hitting the same issue as reported in https://github.com/containers/podman/issues/24147 in a rather particular way: the socat client sends the one-byte message, the socat server issues a connect(), and while the connect() is pending, the socat client sends the NULL byte (shut-null) to terminate the server (null-eof), which the server misses because of the race condition I described in https://lore.kernel.org/netdev/20241114215414.3357873-3-sbrivio@redhat.com/.

I will submit a new version of that fix for the net-next kernel tree when it reopens at the end of the 6.13 merge window (beginning of December), so that fix will probably be in 6.14. We'll need to wait a while.