containers / gvisor-tap-vsock

A new network stack based on gVisor
Apache License 2.0
265 stars 49 forks source link

Missing Close() in gvisor-tap-vsock, or tcpproxy bug? #387

Open cfergeau opened 2 months ago

cfergeau commented 2 months ago

We cannot use the latest github.com/inetaf/tcpproxy commit because of https://github.com/containers/gvisor-tap-vsock/commit/61dc4e1eb260427d9a39ccaeeb75fe85be7dcccc which causes a regression in podman https://github.com/containers/podman/issues/23616

My feeling is that this revert is either hiding a gvisor-tap-sock bug which has been present for a long time, or it's avoiding a newly introduced bug in inetaf/tcpproxy. In either cases ,it would be good to understand better what caused the issue leading to the revert. The podman bug has a reproducer for it.

cfergeau commented 1 month ago

This can be reproduced with podman machine by following these steps (I tested on a mac):

evidolob commented 1 month ago

@cfergeau I make some investigation on this, it look like that problem is in recent version of /inetaf/tcpproxy The problem is in https://github.com/inetaf/tcpproxy/blob/3ce58045626c8bc343a593c90354975e61b1817a/tcpproxy.go#L404-L408

If you look on https://github.com/containers/gvisor-tap-vsock/commit/61dc4e1eb260427d9a39ccaeeb75fe85be7dcccc it is removing second <-errc, which is added in newer version. For me, that second channel receive is never resolve, so function HandleConn is never return and connections is never closed. So if I remove that second <-errc in newest version it fix the issue.
Look like one of go proxyCopy calls is never returns(put anything in the errc channel), going to look deeper in to the code

cfergeau commented 1 month ago

If you look on 61dc4e1 it is removing second <-errc, which is added in newer version. For me, that second channel receive is never resolve, so function HandleConn is never return and connections is never closed. So if I remove that second <-errc in newest version it fix the issue.

Thanks for the research! Looks consistent with the investigation which was done in https://github.com/containers/podman/issues/23616