containers / gvisor-tap-vsock

A new network stack based on gVisor
Apache License 2.0
237 stars 46 forks source link

vfkit: gvproxy exits on high network traffic #367

Open cfergeau opened 3 days ago

cfergeau commented 3 days ago

This was reported in, gvproxy exits when pulling big images on a fast network connection.

The issue is coming from:

time="2024-06-28T10:59:10+02:00" level=error msg="write unixgram /var/folders/09/9bv34hm11vb94tmwhtqyyx880000gn/T/podman/test-gvproxy.sock->/Users/riccardoforina/Library/Application Support/vfkit/net-15351-479784434.sock: sendto: no buffer space available"

I was seeing this when I added vfkit support to gvisor-tap-vsock until I added This is unfortunately not good enough, and the maximum for these values is 810241024, and Riccardo is still having this issue with the maximum.

If I remember correctly, the "buffer is full" error were coming from the tx/rx functions in

Luap99 commented 3 days ago

Now I am not an expert in how this works but shouldn't gvproxy just retry on ENOBUFS, also I would have assumed to sendto call to block instead of returning such a error. Was the socket configured non blocking maybe?

If no messages space is available at the socket to hold the message to be transmitted, then send() normally blocks, unless the socket has been placed in non-blocking I/O mode. The select(2) call may be used to determine when it is possible to send more data.

cfergeau commented 3 days ago

Now I am not an expert in how this works but shouldn't gvproxy just retry on ENOBUFS, also I would have assumed to sendto call to block instead of returning such a error. Was the socket configured non blocking maybe?

I also did not feel confident enough at the time to make significant changes to the inner tx/rx code shared by all virt providers (qemu, vfkit, hyperkit, ...). However, the socket is wrapped in a vfkit-specific net.Conn, so we most likely could add blocking/retries there This definitely needs a closer look now that the workaround has proven not to be enough and results in nasty failures.

riccardo-forina commented 3 days ago

Please ping me for more debugs and to run experimental versions, as I can reliably trigger the problem.

gbraad commented 3 days ago

What are the most minimal instructions to trigger this?

riccardo-forina commented 3 days ago

On my system, an M1 with 16Gb of RAM and a 1Gbit internet connection (wired), it's enough to create a machine with 3 cpus to trigger the problem with a near 100% success rate. 4 and above I think will give you 100% success. I ruled out the rootful option as it doesn't seem to play any role in this.

podman machine init --cpus 4

For the docker-compose.yaml, I think any would work. I'm using this one just because I have it handy on my home

    image: postgres:14
    hostname: postgresql
      - pg_data:/var/lib/postgresql/data
      POSTGRES_DB: "conduktor-platform"
      POSTGRES_USER: "conduktor"
      POSTGRES_PASSWORD: "change_me"
      POSTGRES_HOST_AUTH_METHOD: "scram-sha-256"
      - "5432:5432"

    image: conduktor/conduktor-platform:1.19.0
      - postgresql
      - "8081:8080"
      - conduktor_data:/var/conduktor
      CDK_DATABASE_URL: "postgresql://conduktor:change_me@postgresql:5432/conduktor-platform"
      CDK_MONITORING_CORTEX-URL: http://conduktor-monitoring:9009/
      CDK_MONITORING_ALERT-MANAGER-URL: http://conduktor-monitoring:9010/
      CDK_MONITORING_CALLBACK-URL: http://conduktor-platform:8080/monitoring/api/
      test: curl -f http://localhost:8080/platform/api/modules/health/live || exit 1
      interval: 10s
      start_period: 10s
      timeout: 5s
      retries: 3
    image: conduktor/conduktor-platform-cortex:1.19.0
      CDK_CONSOLE-URL: "http://conduktor-platform:8080"
  pg_data: {}
  conduktor_data: {}
gbraad commented 3 days ago

just the creation with podman machine init already triggers this?

updated: ah, a compose script.

I'll try to recreate a 'simpler' reproducer

Luap99 commented 3 days ago

I assume you need a high speed connection to trigger it, maybe try to use iperf3 between host and VM.

Luap99 commented 3 days ago

On the macos host run iperf3 -s to start the server

Then in another terminal run iperf3 in a container as client, using --network host to not get any slow downs from the container networking, using -R to send data from the server to the client (like pull images does) and lastly the important bit -P 8 to run things in parallel, without it I as not able to reproduce.

$ podman run --network host -it networkstatic/iperf3 -c host.containers.internal -R -P 8
Connecting to host host.containers.internal, port 5201
Reverse mode, remote host host.containers.internal is sending
[  5] local port 58958 connected to port 5201
[  7] local port 58960 connected to port 5201
[  9] local port 58976 connected to port 5201
[ 11] local port 58980 connected to port 5201
[ 13] local port 58992 connected to port 5201
[ 15] local port 59000 connected to port 5201
[ 17] local port 59012 connected to port 5201
[ 19] local port 59018 connected to port 5201
Error: Post "http://d/v5.1.1/libpod/containers/7ed13089ab5ece5c87c25b74c6bd842222f1992e6021e628e5b4c79ced226157/wait": EOF
cfergeau commented 3 days ago

It should be easy to reproduce with podman pull if you rebuild gvproxy with:

diff --git a/pkg/transport/unixgram_darwin.go b/pkg/transport/unixgram_darwin.go
index 12d3c50a..db473ade 100644
--- a/pkg/transport/unixgram_darwin.go
+++ b/pkg/transport/unixgram_darwin.go
@@ -8,7 +8,6 @@ import (
-       "syscall"

 type connectedUnixgramConn struct {
@@ -17,22 +16,6 @@ type connectedUnixgramConn struct {

 func connectListeningUnixgramConn(conn *net.UnixConn, remoteAddr *net.UnixAddr) (*connectedUnixgramConn, error) {
-       rawConn, err := conn.SyscallConn()
-       if err != nil {
-               return nil, err
-       }
-       err = rawConn.Control(func(fd uintptr) {
-               if err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_SNDBUF, 1*1024*1024); err != nil {
-                       return
-               }
-               if err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_RCVBUF, 4*1024*1024); err != nil {
-                       return
-               }
-       })
-       if err != nil {
-               return nil, err
-       }
        return &connectedUnixgramConn{
                UnixConn:   conn,
                remoteAddr: remoteAddr,