containers / gvisor-tap-vsock

A new network stack based on gVisor
Apache License 2.0
237 stars 46 forks source link

vfkit: gvproxy exits on high network traffic #367

Open cfergeau opened 3 days ago

cfergeau commented 3 days ago

This was reported in https://github.com/containers/podman/issues/23114, gvproxy exits when pulling big images on a fast network connection.

The issue is coming from:

time="2024-06-28T10:59:10+02:00" level=error msg="write unixgram /var/folders/09/9bv34hm11vb94tmwhtqyyx880000gn/T/podman/test-gvproxy.sock->/Users/riccardoforina/Library/Application Support/vfkit/net-15351-479784434.sock: sendto: no buffer space available"

I was seeing this when I added vfkit support to gvisor-tap-vsock until I added https://github.com/containers/gvisor-tap-vsock/blob/6dbbe087eb62775e99abc69ac232d13f74cac73a/pkg/transport/unixgram_darwin.go#L24-L30 This is unfortunately not good enough, and the maximum for these values is 810241024, and Riccardo is still having this issue with the maximum.

If I remember correctly, the "buffer is full" error were coming from the tx/rx functions in https://github.com/containers/gvisor-tap-vsock/blob/main/pkg/tap/switch.go

Luap99 commented 3 days ago

Now I am not an expert in how this works but shouldn't gvproxy just retry on ENOBUFS, also I would have assumed to sendto call to block instead of returning such a error. Was the socket configured non blocking maybe?

https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/sendto.2.html

If no messages space is available at the socket to hold the message to be transmitted, then send() normally blocks, unless the socket has been placed in non-blocking I/O mode. The select(2) call may be used to determine when it is possible to send more data.

cfergeau commented 3 days ago

Now I am not an expert in how this works but shouldn't gvproxy just retry on ENOBUFS, also I would have assumed to sendto call to block instead of returning such a error. Was the socket configured non blocking maybe?

I also did not feel confident enough at the time to make significant changes to the inner tx/rx code shared by all virt providers (qemu, vfkit, hyperkit, ...). However, the socket is wrapped in a vfkit-specific net.Conn, so we most likely could add blocking/retries there https://github.com/containers/gvisor-tap-vsock/blob/main/pkg/transport/unixgram_darwin.go This definitely needs a closer look now that the workaround has proven not to be enough and results in nasty failures.

riccardo-forina commented 3 days ago

Please ping me for more debugs and to run experimental versions, as I can reliably trigger the problem.

gbraad commented 3 days ago

What are the most minimal instructions to trigger this?

riccardo-forina commented 3 days ago

On my system, an M1 with 16Gb of RAM and a 1Gbit internet connection (wired), it's enough to create a machine with 3 cpus to trigger the problem with a near 100% success rate. 4 and above I think will give you 100% success. I ruled out the rootful option as it doesn't seem to play any role in this.

podman machine init --cpus 4

For the docker-compose.yaml, I think any would work. I'm using this one just because I have it handy on my home


services:  
  postgresql:
    image: postgres:14
    hostname: postgresql
    volumes:
      - pg_data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: "conduktor-platform"
      POSTGRES_USER: "conduktor"
      POSTGRES_PASSWORD: "change_me"
      POSTGRES_HOST_AUTH_METHOD: "scram-sha-256"
    ports:
      - "5432:5432"

  conduktor-platform:
    image: conduktor/conduktor-platform:1.19.0
    depends_on:
      - postgresql
    ports:
      - "8081:8080"
    volumes:
      - conduktor_data:/var/conduktor
    environment:
      CDK_DATABASE_URL: "postgresql://conduktor:change_me@postgresql:5432/conduktor-platform"
      CDK_MONITORING_CORTEX-URL: http://conduktor-monitoring:9009/
      CDK_MONITORING_ALERT-MANAGER-URL: http://conduktor-monitoring:9010/
      CDK_MONITORING_CALLBACK-URL: http://conduktor-platform:8080/monitoring/api/
      CDK_MONITORING_NOTIFICATIONS-CALLBACK-URL: http://localhost:8080
    healthcheck:
      test: curl -f http://localhost:8080/platform/api/modules/health/live || exit 1
      interval: 10s
      start_period: 10s
      timeout: 5s
      retries: 3
  conduktor-monitoring:
    image: conduktor/conduktor-platform-cortex:1.19.0
    environment:
      CDK_CONSOLE-URL: "http://conduktor-platform:8080"
volumes:
  pg_data: {}
  conduktor_data: {}
gbraad commented 3 days ago

just the creation with podman machine init already triggers this?


updated: ah, a compose script.


I'll try to recreate a 'simpler' reproducer

Luap99 commented 3 days ago

I assume you need a high speed connection to trigger it, maybe try to use iperf3 between host and VM.

Luap99 commented 3 days ago

On the macos host run iperf3 -s to start the server

Then in another terminal run iperf3 in a container as client, using --network host to not get any slow downs from the container networking, using -R to send data from the server to the client (like pull images does) and lastly the important bit -P 8 to run things in parallel, without it I as not able to reproduce.

$ podman run --network host -it networkstatic/iperf3 -c host.containers.internal -R -P 8
Connecting to host host.containers.internal, port 5201
Reverse mode, remote host host.containers.internal is sending
[  5] local 192.168.127.2 port 58958 connected to 192.168.127.254 port 5201
[  7] local 192.168.127.2 port 58960 connected to 192.168.127.254 port 5201
[  9] local 192.168.127.2 port 58976 connected to 192.168.127.254 port 5201
[ 11] local 192.168.127.2 port 58980 connected to 192.168.127.254 port 5201
[ 13] local 192.168.127.2 port 58992 connected to 192.168.127.254 port 5201
[ 15] local 192.168.127.2 port 59000 connected to 192.168.127.254 port 5201
[ 17] local 192.168.127.2 port 59012 connected to 192.168.127.254 port 5201
[ 19] local 192.168.127.2 port 59018 connected to 192.168.127.254 port 5201
Error: Post "http://d/v5.1.1/libpod/containers/7ed13089ab5ece5c87c25b74c6bd842222f1992e6021e628e5b4c79ced226157/wait": EOF
cfergeau commented 3 days ago

It should be easy to reproduce with podman pull if you rebuild gvproxy with:

diff --git a/pkg/transport/unixgram_darwin.go b/pkg/transport/unixgram_darwin.go
index 12d3c50a..db473ade 100644
--- a/pkg/transport/unixgram_darwin.go
+++ b/pkg/transport/unixgram_darwin.go
@@ -8,7 +8,6 @@ import (
        "encoding/hex"
        "fmt"
        "net"
-       "syscall"
 )

 type connectedUnixgramConn struct {
@@ -17,22 +16,6 @@ type connectedUnixgramConn struct {
 }

 func connectListeningUnixgramConn(conn *net.UnixConn, remoteAddr *net.UnixAddr) (*connectedUnixgramConn, error) {
-       rawConn, err := conn.SyscallConn()
-       if err != nil {
-               return nil, err
-       }
-       err = rawConn.Control(func(fd uintptr) {
-               if err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_SNDBUF, 1*1024*1024); err != nil {
-                       return
-               }
-               if err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_RCVBUF, 4*1024*1024); err != nil {
-                       return
-               }
-       })
-       if err != nil {
-               return nil, err
-       }
-
        return &connectedUnixgramConn{
                UnixConn:   conn,
                remoteAddr: remoteAddr,