golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.7k stars 17.49k forks source link

net: sending UDP datagrams back and forth fails with "sendmsg: operation not permitted" #63322

Open marten-seemann opened 11 months ago

marten-seemann commented 11 months ago

What version of Go are you using (go version)?

$ go version
go1.21.0 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/home/parallels/.cache/go-build'
GOENV='/home/parallels/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/parallels/src/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/parallels/src/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/parallels/bin/go1.21ex'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/parallels/bin/go1.21ex/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.21.0'
GCCGO='gccgo'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/parallels/src/go/src/github.com/libp2p/go-libp2p/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build618110419=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I created two UDP sockets, and send UDP datagrams from one to the other, and vice versa:

func newUDPConn(t *testing.T) *net.UDPConn {
    addr, err := net.ResolveUDPAddr("udp", "localhost:0")
    if err != nil {
        t.Fatal(err)
    }
    conn, err := net.ListenUDP("udp", addr)
    if err != nil {
        t.Fatal(err)
    }
    t.Cleanup(func() { conn.Close() })
    return conn
}

func TestUDPSending(t *testing.T) {
    c1 := newUDPConn(t)
    c2 := newUDPConn(t)

    const num = 1000
    done := make(chan struct{}, 2)
    var failed1, failed2 bool
    go func() {
        defer func() { done <- struct{}{} }()
        for i := 0; i < num; i++ {
            _, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr))
            if err != nil {
                failed1 = true
                t.Logf("c1 send failed %d: %v\n", i, err)
            }
            time.Sleep(200 * time.Microsecond)
        }
    }()

    go func() {
        defer func() { done <- struct{}{} }()
        for i := 0; i < num; i++ {
            _, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr))
            if err != nil {
                failed2 = true
                t.Logf("c2 send failed %d: %v\n", i, err)
            }
            time.Sleep(200 * time.Microsecond)
        }
    }()

    <-done
    <-done

    if failed1 || failed2 {
        t.Fail()
    }
}

What did you expect to see?

I expected UDP datagrams to be sent on both sockets.

What did you see instead?

More often than not, this test fails on Linux:

c1 send failed 0: write udp 127.0.0.1:47716->127.0.0.1:34183: sendmsg: operation not permitted

It's either the first or the second connection that fails, but if there's a failure, it's always only the first call that fails. All subsequent calls complete successfully.

On macOS, this code doesn't produce any errors.

qiulaidongfeng commented 11 months ago

I cannot reproduce failures in both wsl1 and github code spaces using the TestUDPSending you provided.

rittneje commented 11 months ago

I am able to reproduce (on Linux), but it is very hit-or-miss. Strangely, it only seems to happen on the host - I cannot reproduce in a Docker container.

From looking around, it seems like this may be a Linux "feature" where sendmsg sometimes returns EPERM if the firewall buffer is full, or something like that. It's not very clear though.

Strangely, EPERM is not defined as one of the error values from sendmsg, so no matter what the kernel isn't even abiding by its own documentation.

dennis-tra commented 11 months ago

I've run the snippet 100 times on a Linux machine and on MacOS but couldn't reproduce. Interestingly, on Linux a test run takes ~1.2s, and on MacOS ~0.35s.

Linux: Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-56-generic x86_64) (go1.20.1) MacOS: Ventura 13.5.2. (go1.20.0)

Snippet to run the test 100 times ```go package main import ( "net" "testing" "time" ) func newUDPConn(t *testing.T) *net.UDPConn { addr, err := net.ResolveUDPAddr("udp", "localhost:0") if err != nil { t.Fatal(err) } conn, err := net.ListenUDP("udp", addr) if err != nil { t.Fatal(err) } t.Cleanup(func() { conn.Close() }) return conn } func TestUDPSending(t *testing.T) { c1 := newUDPConn(t) c2 := newUDPConn(t) for j := 0; j < 100; j++ { t.Run("", func(t *testing.T) { const num = 1000 done := make(chan struct{}, 2) var failed1, failed2 bool go func() { defer func() { done <- struct{}{} }() for i := 0; i < num; i++ { _, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr)) if err != nil { failed1 = true t.Logf("c1 send failed %d: %v\n", i, err) } time.Sleep(200 * time.Microsecond) } }() go func() { defer func() { done <- struct{}{} }() for i := 0; i < num; i++ { _, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr)) if err != nil { failed2 = true t.Logf("c2 send failed %d: %v\n", i, err) } time.Sleep(200 * time.Microsecond) } }() <-done <-done if failed1 || failed2 { t.Fail() } }) } } ```
sukunrt commented 11 months ago

@dennis-tra You need to make new udp conns on each run. The error concerns the initial packets exchanged on the connection. This variant fails quite frequently for me on linux.

package main

import (
    "net"
    "testing"
    "time"
)

func newUDPConn(t *testing.T) *net.UDPConn {
    addr, err := net.ResolveUDPAddr("udp", "localhost:0")
    if err != nil {
        t.Fatal(err)
    }
    conn, err := net.ListenUDP("udp", addr)
    if err != nil {
        t.Fatal(err)
    }
    t.Cleanup(func() { conn.Close() })
    return conn
}

func TestUDPSending(t *testing.T) {

    for j := 0; j < 100; j++ {
        t.Run("", func(t *testing.T) {
            c1 := newUDPConn(t)
            c2 := newUDPConn(t)
            const num = 1000
            done := make(chan struct{}, 2)
            var failed1, failed2 bool
            go func() {
                defer func() { done <- struct{}{} }()
                for i := 0; i < num; i++ {
                    _, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr))
                    if err != nil {
                        failed1 = true
                        t.Logf("c1 send failed %d: %v\n", i, err)
                    }
                    time.Sleep(200 * time.Microsecond)
                }
            }()

            go func() {
                defer func() { done <- struct{}{} }()
                for i := 0; i < num; i++ {
                    _, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr))
                    if err != nil {
                        failed2 = true
                        t.Logf("c2 send failed %d: %v\n", i, err)
                    }
                    time.Sleep(200 * time.Microsecond)
                }
            }()

            <-done
            <-done

            if failed1 || failed2 {
                t.Fail()
            }
        })
    }
}
qiulaidongfeng commented 11 months ago

Use the following code

package main

import (
    "net"
    "testing"
    "time"
)

func newUDPConn(t *testing.T) *net.UDPConn {
    addr, err := net.ResolveUDPAddr("udp", "localhost:0")
    if err != nil {
        t.Fatal(err)
    }
    conn, err := net.ListenUDP("udp", addr)
    if err != nil {
        t.Fatal(err)
    }
    t.Cleanup(func() { conn.Close() })
    return conn
}

func TestUDPSending(t *testing.T) {

    for j := 0; j < 100; j++ {
        t.Run("", func(t *testing.T) {
            t.Parallel()
            c1 := newUDPConn(t)
            c2 := newUDPConn(t)
            const num = 1000
            done := make(chan struct{}, 2)
            var failed1, failed2 bool
            go func() {
                defer func() { done <- struct{}{} }()
                for i := 0; i < num; i++ {
                    _, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr))
                    if err != nil {
                        failed1 = true
                        t.Logf("c1 send failed %d: %v\n", i, err)
                    }
                    time.Sleep(200 * time.Microsecond)
                }
            }()

            go func() {
                defer func() { done <- struct{}{} }()
                for i := 0; i < num; i++ {
                    _, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr))
                    if err != nil {
                        failed2 = true
                        t.Logf("c2 send failed %d: %v\n", i, err)
                    }
                    time.Sleep(200 * time.Microsecond)
                }
            }()

            <-done
            <-done

            if failed1 || failed2 {
                t.Fail()
            }
        })
    }
}

Editor [Sorry for accidentally confusing the platform]: Successfully reproduced the failure on the Linux platform in Github code space, while wsl1 failed to reproduce the failure.

dennis-tra commented 11 months ago

Thanks for correcting, and sorry for the noise. However, even with both of your corrected versions, I cannot reproduce it on my Linux machine.

prattmic commented 11 months ago

cc @neild @ianlancetaylor

bcmills commented 10 months ago

Strangely, EPERM is not defined as one of the error values from sendmsg, so no matter what the kernel isn't even abiding by its own documentation.

POSIX does not seem to allow EPERM from sendmsg either: https://pubs.opengroup.org/onlinepubs/9699919799/functions/sendmsg.html It seems like if the firewall buffer is full it ought to return ENOBUFS instead. 🤷‍♂️

At any rate, this looks like maybe a Linux kernel bug rather than a Go bug, and (especially given that it is UDP) it's not clear to me that Go should necessarily work around that bug itself.

ArtemGr commented 10 months ago

FYI, I've seen this “sendmsg: operation not permitted” happen with Syncthing (QUIC) due to UDPFLOOD protection, configured automatically by DirectAdmin (iptables -vL UDPFLOOD showing "limit: avg 100/sec burst 500" and "limit: avg 30/min burst 5").

bcmills commented 10 months ago

@ArtemGr, that still sounds consistent with a this being a Linux kernel bug. (Perhaps enabling UDPFLOOD limits causes the kernel to return an error that is not consistent with its documentation?)