containers / gvisor-tap-vsock

A new network stack based on gVisor
Apache License 2.0
269 stars 50 forks source link

Fix consistent udp packet loss after the proxy read loop stopped #393

Closed fatanugraha closed 2 months ago

fatanugraha commented 2 months ago

Currently we never close the tcpip.Endpoint that we created when we get *udp.ForwarderRequest. This causes all packets that is sent by the same src ip:port after we return from the UDPProxy.Run to be "dropped".

By closing the endpoint, we will get new forwarder request after we return from UDPProxy.Run so we can process new packets.

Here's my reproduction code:

  1. Reuse the same local address when sending udp requests
  2. Send one DNS request (success)
  3. wait until UDPProxy.Run to return (after 90s)
  4. Send one DNS request (failed)
package main

import (
    "context"
    "fmt"
    "net"
    "time"
)

func main() {
    r := &net.Resolver{
        PreferGo: true,
        Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
            addr, err := net.ResolveUDPAddr("udp", "192.168.5.1:40001")
            if err != nil {
                panic(err)
            }

            d := net.Dialer{
                Timeout:   time.Millisecond * time.Duration(10000),
                KeepAlive: -1,
                LocalAddr: addr,
            }

            conn, err := d.DialContext(ctx, network, "8.8.8.8:53")
            if err != nil {
                panic(err)
            }

            return conn, err
        },
    }

    lookup := func() {
        _, err := r.LookupIP(context.Background(), "ip4", "www.google.com")
        if err != nil {
            fmt.Println("err", err)
        } else {
            fmt.Println("ok")
        }
    }

    lookup()                     // ok
    time.Sleep(95 * time.Second) // wait for the UDPConnTimeout
    lookup()                     // this will fail
}
fatanugraha commented 2 months ago

/assign cfergeau

fatanugraha commented 2 months ago

/cc baude cfergeau

evidolob commented 2 months ago

@fatanugraha I was trying to test this PR. I try to run test that you provided and it works fine(I don't get any errors, just two ok). I try that on macOS and fedora 40. So I was wondering is I missing something?

fatanugraha commented 2 months ago

Hi @evidolob I've put more detailed reproduction steps here: https://github.com/fatanugraha/gvisor-tap-proxy-393

Do let me know if you have further questions 🙇

attached debug logs from gvproxy (notice that the dns query from the same local addr starts failing after this log is printed DEBU[0122] Stopping udp proxy (read udp 8.8.8.8:53: i/o timeout)

gvproxy.log

capture.pcap.zip

Screenshot 2024-09-15 at 23 28 04
evidolob commented 2 months ago

@cfergeau I can verify that problem described in this PR description exist, and PR indeed fix it.

cfergeau commented 2 months ago

I forced pushed to the branch to fix a few typos in the comment. /lgtm /approve

openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfergeau, evidolob, fatanugraha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/containers/gvisor-tap-vsock/blob/main/OWNERS)~~ [cfergeau] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
cfergeau commented 2 months ago

I wonder if this PR could help with https://github.com/containers/gvisor-tap-vsock/issues/387 ? (dropping a note here as I can't test/look closely now)

cfergeau commented 4 weeks ago

I wonder if this PR could help with #387 ? (dropping a note here as I can't test/look closely now)

Yevhen tested this, and this does not help.