Closed elgatito closed 2 years ago
this is due to how the server runs the accept listener in a routine and panics on basically any error =(
the serve method should be exposed and allow the caller to handle any errors.
I think that is the best approach. Is anyone still seeing this panic occur?
Let's reopen if someone has the problem again.
@anacrolix
I wanted to run the dht-server
and encountered this problem.
I think I have a solution to this but it might involve a patch to Go and I need to do homework on which WSA errors are and are not to be considered temporary.
I'm going to investigate this after I get some sleep but wanted to make this post so I don't forget to follow up with this.
Screencast related: https://www.youtube.com/watch?v=GAkn9KI4ceY
Will submit a PR later.
@djdv Really cool to watch your screencast. Your process and conclusions are spot on regarding WSA errors, I think this has been reported, and commented on in various places in my own codebases, and in golang/go.
Nice. I'm going to wait to see what some of the Go team says about which WSA errors do and don't count as temporary before submitting anything. The manual hacks might not be necessary if they just get rolled into go. If not, I'll have to break the check out into a build constrained file. (Sorry for the delay, I took a small injury (;´∀`)
For now, if anyone really needs it, here's the hacky patch.
diff --git a/server.go b/server.go
index eedfa81..efb5b6e 100644
--- a/server.go
+++ b/server.go
@@ -7,6 +7,8 @@ import (
"fmt"
"io"
"net"
+ "os"
+ "runtime"
"runtime/pprof"
"text/tabwriter"
"time"
@@ -24,6 +26,8 @@ import (
"github.com/anacrolix/stm"
"github.com/anacrolix/dht/v2/krpc"
+
+ "golang.org/x/sys/windows"
)
// A Server defines parameters for a DHT node server that is able to send
@@ -291,11 +295,37 @@ func (s *Server) processPacket(b []byte, addr Addr) {
s.deleteTransaction(tk)
}
+// TODO: [anyone] remove this when Go itself considers these errors to be temporary errors
+func actuallyTemporary(nErr *net.OpError) bool {
+ var sErr *os.SyscallError
+ if errors.As(nErr.Err, &sErr) {
+ switch sErr.Err {
+ case windows.WSAENETRESET,
+ windows.WSAECONNRESET,
+ windows.WSAECONNABORTED,
+ windows.WSAECONNREFUSED,
+ windows.WSAENETUNREACH,
+ windows.WSAETIMEDOUT:
+ return true
+ }
+ }
+ return false
+}
+
func (s *Server) serve() error {
var b [0x10000]byte
for {
n, addr, err := s.socket.ReadFrom(b[:])
if err != nil {
+ var nErr *net.OpError
+ if errors.As(err, &nErr) {
+ if nErr.Temporary() {
+ continue
+ }
+ if runtime.GOOS == "windows" && actuallyTemporary(nErr) {
+ continue
+ }
+ }
return err
}
expvars.Add("packets read", 1)
Thanks @djdv . Could you link to the golang/go efforts to fix this?
It looks like this is related to it https://github.com/golang/go/issues/35131 But the effort was abandoned. I'm going to look into it myself after I tackle something else first. Will post an update when things happen.
Likely just the temporary check will need to go in here and then newer version of Go will handle it. But if not, we'll need to break the other half out into a Windows constrained file.
@djdv any update on this?
I'm now seeing this in v2.17.0:
panic: read udp 0.0.0.0:54576: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.
goroutine 42 [running]:
github.com/anacrolix/dht/v2.(*Server).serveUntilClosed(0x150ce000)
/go/pkg/mod/github.com/anacrolix/dht/v2@v2.17.0/server.go:256 +0xe3
created by github.com/anacrolix/dht/v2.NewServer
/go/pkg/mod/github.com/anacrolix/dht/v2@v2.17.0/server.go:244 +0x60d
I'll make an up to date workaround in the short term.
Same problem as https://github.com/anacrolix/torrent/issues/83.
Is anyone still seeing this with v2.18.1?
Sorry for not getting back to you on this. (I went mental IRL)
Going to run an instance of the dht-server
for a while and see if anything goes wrong. Promise not to take years to respond.
Also it's too late to mention it now, but it looks like Go ended up deprecating the whole Temporary
method on net errors without any alternative (besides the original suggestion of manually comparing errors -- now via errors.Is
-- to check against system specific errors, which aint great but is what it is)
// Deprecated: Temporary errors are not well-defined. // Most temporary errors are timeouts, and the few exceptions are surprising. // Do not use this method.
No worries, thanks for contributing. I'm tempted to swap what I put in for the patch you provided above, especially if your testing shows what I have isn't sufficient.
That might be the best for now. I'm no network expert, so I can't say for certain all of these errors can safely be ignored, but it's probably better to ignore them until someone says "hey that one shouldn't be ignored", rather than panicking.
I say this because I got a panic almost immediately for WSAENETRESET
(that's error 10052, not WSAECONNRESET/10054), which is conveniently not within the syscall
package, but is in the windows
pkg.
My log:
17:36:03 🏠 ⧽ bt-dht
2022-10-20T17:36:27-0400 NIL [main.loadTable:31]: loaded 0 nodes from table file
2022-10-20T17:36:27-0400 NIL [main.mainErr:72]: dht server on 192.168.1.40:40070, ID is 0f23c260e1c688e4d9b6a9e294d686b1fe36311a
panic: read udp 192.168.1.40:40070: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.
goroutine 6 [running]:
github.com/anacrolix/dht/v2.(*Server).serveUntilClosed(0xc000188000)
C:/Users/Dominic Della Valle/Projects/Go/src/github.com/anacrolix/dht/server.go:256 +0xdc
created by github.com/anacrolix/dht/v2.NewServer
C:/Users/Dominic Della Valle/Projects/Go/src/github.com/anacrolix/dht/server.go:244 +0x79c
Err, just to be clear. The diff above should be changed to avoid calling Temporary, but that same list of error values should be checked instead of just the one being used right now.
Edit: keeping the build constraint like you have now is also better than doing it at runtime like in the quick hack I had.
Thanks, I'll make that change soon.
Update to v2.19.1.
Ran an instance for a few hours. Seems to be working alright.
@djdv thanks for the feedback and all the contribution you've made in this issue, it's really helped.
Panic caught by user on windows.
http://paste.ubuntu.com/26059096/
There is a Client config on line 137.