Open marten-seemann opened 11 months ago
I cannot reproduce failures in both wsl1 and github code spaces using the TestUDPSending you provided.
I am able to reproduce (on Linux), but it is very hit-or-miss. Strangely, it only seems to happen on the host - I cannot reproduce in a Docker container.
From looking around, it seems like this may be a Linux "feature" where sendmsg
sometimes returns EPERM
if the firewall buffer is full, or something like that. It's not very clear though.
Strangely, EPERM
is not defined as one of the error values from sendmsg
, so no matter what the kernel isn't even abiding by its own documentation.
I've run the snippet 100 times on a Linux machine and on MacOS but couldn't reproduce. Interestingly, on Linux a test run takes ~1.2s, and on MacOS ~0.35s.
Linux: Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-56-generic x86_64)
(go1.20.1)
MacOS: Ventura 13.5.2.
(go1.20.0)
@dennis-tra You need to make new udp conns on each run. The error concerns the initial packets exchanged on the connection. This variant fails quite frequently for me on linux.
package main
import (
"net"
"testing"
"time"
)
func newUDPConn(t *testing.T) *net.UDPConn {
addr, err := net.ResolveUDPAddr("udp", "localhost:0")
if err != nil {
t.Fatal(err)
}
conn, err := net.ListenUDP("udp", addr)
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() { conn.Close() })
return conn
}
func TestUDPSending(t *testing.T) {
for j := 0; j < 100; j++ {
t.Run("", func(t *testing.T) {
c1 := newUDPConn(t)
c2 := newUDPConn(t)
const num = 1000
done := make(chan struct{}, 2)
var failed1, failed2 bool
go func() {
defer func() { done <- struct{}{} }()
for i := 0; i < num; i++ {
_, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr))
if err != nil {
failed1 = true
t.Logf("c1 send failed %d: %v\n", i, err)
}
time.Sleep(200 * time.Microsecond)
}
}()
go func() {
defer func() { done <- struct{}{} }()
for i := 0; i < num; i++ {
_, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr))
if err != nil {
failed2 = true
t.Logf("c2 send failed %d: %v\n", i, err)
}
time.Sleep(200 * time.Microsecond)
}
}()
<-done
<-done
if failed1 || failed2 {
t.Fail()
}
})
}
}
Use the following code
package main
import (
"net"
"testing"
"time"
)
func newUDPConn(t *testing.T) *net.UDPConn {
addr, err := net.ResolveUDPAddr("udp", "localhost:0")
if err != nil {
t.Fatal(err)
}
conn, err := net.ListenUDP("udp", addr)
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() { conn.Close() })
return conn
}
func TestUDPSending(t *testing.T) {
for j := 0; j < 100; j++ {
t.Run("", func(t *testing.T) {
t.Parallel()
c1 := newUDPConn(t)
c2 := newUDPConn(t)
const num = 1000
done := make(chan struct{}, 2)
var failed1, failed2 bool
go func() {
defer func() { done <- struct{}{} }()
for i := 0; i < num; i++ {
_, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr))
if err != nil {
failed1 = true
t.Logf("c1 send failed %d: %v\n", i, err)
}
time.Sleep(200 * time.Microsecond)
}
}()
go func() {
defer func() { done <- struct{}{} }()
for i := 0; i < num; i++ {
_, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr))
if err != nil {
failed2 = true
t.Logf("c2 send failed %d: %v\n", i, err)
}
time.Sleep(200 * time.Microsecond)
}
}()
<-done
<-done
if failed1 || failed2 {
t.Fail()
}
})
}
}
Editor [Sorry for accidentally confusing the platform]: Successfully reproduced the failure on the Linux platform in Github code space, while wsl1 failed to reproduce the failure.
Thanks for correcting, and sorry for the noise. However, even with both of your corrected versions, I cannot reproduce it on my Linux machine.
cc @neild @ianlancetaylor
Strangely,
EPERM
is not defined as one of the error values fromsendmsg
, so no matter what the kernel isn't even abiding by its own documentation.
POSIX does not seem to allow EPERM
from sendmsg
either:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sendmsg.html
It seems like if the firewall buffer is full it ought to return ENOBUFS
instead. 🤷♂️
At any rate, this looks like maybe a Linux kernel bug rather than a Go bug, and (especially given that it is UDP) it's not clear to me that Go should necessarily work around that bug itself.
FYI, I've seen this “sendmsg: operation not permitted” happen with Syncthing (QUIC) due to UDPFLOOD
protection, configured automatically by DirectAdmin (iptables -vL UDPFLOOD
showing "limit: avg 100/sec burst 500" and "limit: avg 30/min burst 5").
@ArtemGr, that still sounds consistent with a this being a Linux kernel bug. (Perhaps enabling UDPFLOOD
limits causes the kernel to return an error that is not consistent with its documentation?)
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I created two UDP sockets, and send UDP datagrams from one to the other, and vice versa:
What did you expect to see?
I expected UDP datagrams to be sent on both sockets.
What did you see instead?
More often than not, this test fails on Linux:
It's either the first or the second connection that fails, but if there's a failure, it's always only the first call that fails. All subsequent calls complete successfully.
On macOS, this code doesn't produce any errors.