Open maciej-szlosarczyk opened 1 month ago
Can you replace only gvproxy binary with the one included in in 5.1.2 installer, should be version gvproxy 0.7.3, we updated the version to 0.7.4 in the 5.2 installer (https://github.com/containers/gvisor-tap-vsock/releases). So I wonder if there is a regression there or on the VM side somehow.
Hey @Luap99!
It works properly with gvproxy 0.7.3:
% ./podman/5.2.0/libexec/podman/gvproxy --version
gvproxy version v0.7.3
% lsof -nP -iTCP | grep gvproxy
gvproxy 63436 maciej 11u IPv4 0x921c900cb3b2c2c3 0t0 TCP 127.0.0.1:57485 (LISTEN)
gvproxy 63436 maciej 34u IPv6 0x9a475d1349cde53d 0t0 TCP *:5432 (LISTEN)
gvproxy 63436 maciej 71u IPv4 0x776bd7258f5958e0 0t0 TCP 127.0.0.1:57485->127.0.0.1:58106 (ESTABLISHED)
Machine internal:
core@localhost:~$ lsof -nP -iTCP
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rootlessp 2568 core 11u IPv6 13502 0t0 TCP *:5432 (LISTEN)
@maciej-szlosarczyk Do you know how to use git bisect? It would be great if you could build gvproxy from source (make gvproxy
in https://github.com/containers/gvisor-tap-vsock/) then copy the binary to your location and test again until you find the commit that caused the regression.
Bisect points at this commit:
https://github.com/containers/gvisor-tap-vsock/commit/600910caefc729efaaae16219be51d081284a104
I managed to narrow it down to these 15 or so lines:
Thanks
cc @praveenkumar @cfergeau
Hey!
I did some additional investigation there and discovered two things:
dst
is *gonet.TCPConn
and not *net.TCPConn
. It falls through the type assertions, but it looks like it was like this even before this change. @maciej-szlosarczyk Can you file a PR for it to https://github.com/containers/gvisor-tap-vsock repo?
The lines you pointed to as causing the regression are related to this inetaf/tcpproxy commit https://github.com/inetaf/tcpproxy/commit/2862066fc2a9405880f212f71230425bdfe9950e
I think what happens with this commit is that before this commit HandleConn
was returning after either the src
or dest
connection was closed/returned an error. My guess is that after HandleConn
returns, both connections were eventually garbage collected.
After commit 2862066fc2a9405880 however, the code now waits for both src
and dest
to be closed or return an error before HandleConn
returns. I suspect when gvproxy shows the bug, a lot of go routines may be stuck in this HandleConn
method.
https://github.com/containers/gvisor-tap-vsock/pull/386 is merged and should address this issue. I'll make a gvisor-tap-vsock release soon. It's likely that it's only papering over a gvisor-tap-vsock or tcpproxy bug, but if it's a preexisting gvisor-tap-vsock bug made more visible by the changes in https://github.com/containers/gvisor-tap-vsock/commit/600910caefc729efaaae16219be51d081284a104, then it's a bug which has been present for years, so we can live with it for a while longer.
Sorry I was ooto, we did a new release yesterday. But if you do a new gvproxy release we can certainly do another release soon to fix this regression.
Issue Description
After upgrade to 5.2.0 gvproxy keeps connections open for a long time I noticed it while running tests against postgresql 11 running in a container. After a few runs postgres would accumulate enough connections and memory usage that it would either get killed due to memory limits or would report that too many connections are open.
This is the output of lsof from both the host and inside the podman machine ten minutes after I ran the test:
Looks like a regression between 5.1.2 and 5.2.0. Downgrading back to 5.1.2 fixes this issue.
Steps to reproduce the issue
docker.io/library/postgres:11
.Describe the results you received
Connections/File descriptors should be closed properly once they're closed by the client.
Describe the results you expected
Describe the results you expected
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
podman version:
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting