anacrolix / dht

dht is used by anacrolix/torrent, and is intended for use as a library in other projects both torrent related and otherwise
Mozilla Public License 2.0
313 stars 66 forks source link

Panic on WSA transient errors #16

Closed elgatito closed 2 years ago

elgatito commented 6 years ago

Panic caught by user on windows.

http://paste.ubuntu.com/26059096/

There is a Client config on line 137.

james-lawrence commented 6 years ago

this is due to how the server runs the accept listener in a routine and panics on basically any error =(

the serve method should be exposed and allow the caller to handle any errors.

anacrolix commented 6 years ago

I think that is the best approach. Is anyone still seeing this panic occur?

anacrolix commented 6 years ago

Let's reopen if someone has the problem again.

djdv commented 4 years ago

@anacrolix I wanted to run the dht-server and encountered this problem. I think I have a solution to this but it might involve a patch to Go and I need to do homework on which WSA errors are and are not to be considered temporary. I'm going to investigate this after I get some sleep but wanted to make this post so I don't forget to follow up with this. Screencast related: https://www.youtube.com/watch?v=GAkn9KI4ceY Will submit a PR later.

anacrolix commented 4 years ago

@djdv Really cool to watch your screencast. Your process and conclusions are spot on regarding WSA errors, I think this has been reported, and commented on in various places in my own codebases, and in golang/go.

djdv commented 4 years ago

Nice. I'm going to wait to see what some of the Go team says about which WSA errors do and don't count as temporary before submitting anything. The manual hacks might not be necessary if they just get rolled into go. If not, I'll have to break the check out into a build constrained file. (Sorry for the delay, I took a small injury (;´∀`)

For now, if anyone really needs it, here's the hacky patch.

diff --git a/server.go b/server.go
index eedfa81..efb5b6e 100644
--- a/server.go
+++ b/server.go
@@ -7,6 +7,8 @@ import (
    "fmt"
    "io"
    "net"
+   "os"
+   "runtime"
    "runtime/pprof"
    "text/tabwriter"
    "time"
@@ -24,6 +26,8 @@ import (
    "github.com/anacrolix/stm"

    "github.com/anacrolix/dht/v2/krpc"
+
+   "golang.org/x/sys/windows"
 )

 // A Server defines parameters for a DHT node server that is able to send
@@ -291,11 +295,37 @@ func (s *Server) processPacket(b []byte, addr Addr) {
    s.deleteTransaction(tk)
 }

+// TODO: [anyone] remove this when Go itself considers these errors to be temporary errors
+func actuallyTemporary(nErr *net.OpError) bool {
+   var sErr *os.SyscallError
+   if errors.As(nErr.Err, &sErr) {
+       switch sErr.Err {
+       case windows.WSAENETRESET,
+           windows.WSAECONNRESET,
+           windows.WSAECONNABORTED,
+           windows.WSAECONNREFUSED,
+           windows.WSAENETUNREACH,
+           windows.WSAETIMEDOUT:
+           return true
+       }
+   }
+   return false
+}
+
 func (s *Server) serve() error {
    var b [0x10000]byte
    for {
        n, addr, err := s.socket.ReadFrom(b[:])
        if err != nil {
+           var nErr *net.OpError
+           if errors.As(err, &nErr) {
+               if nErr.Temporary() {
+                   continue
+               }
+               if runtime.GOOS == "windows" && actuallyTemporary(nErr) {
+                   continue
+               }
+           }
            return err
        }
        expvars.Add("packets read", 1)
anacrolix commented 4 years ago

Thanks @djdv . Could you link to the golang/go efforts to fix this?

djdv commented 4 years ago

It looks like this is related to it https://github.com/golang/go/issues/35131 But the effort was abandoned. I'm going to look into it myself after I tackle something else first. Will post an update when things happen.

Likely just the temporary check will need to go in here and then newer version of Go will handle it. But if not, we'll need to break the other half out into a Windows constrained file.

anacrolix commented 3 years ago

@djdv any update on this?

anacrolix commented 2 years ago

I'm now seeing this in v2.17.0:

panic: read udp 0.0.0.0:54576: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.

goroutine 42 [running]:
github.com/anacrolix/dht/v2.(*Server).serveUntilClosed(0x150ce000)
    /go/pkg/mod/github.com/anacrolix/dht/v2@v2.17.0/server.go:256 +0xe3
created by github.com/anacrolix/dht/v2.NewServer
    /go/pkg/mod/github.com/anacrolix/dht/v2@v2.17.0/server.go:244 +0x60d

I'll make an up to date workaround in the short term.

anacrolix commented 2 years ago

Same problem as https://github.com/anacrolix/torrent/issues/83.

anacrolix commented 2 years ago

Fixed by https://github.com/anacrolix/dht/commit/f1f6c652cb216c69f54d0522dd002cbd41c54575.

anacrolix commented 2 years ago

Is anyone still seeing this with v2.18.1?

djdv commented 2 years ago

Sorry for not getting back to you on this. (I went mental IRL) Going to run an instance of the dht-server for a while and see if anything goes wrong. Promise not to take years to respond.

Also it's too late to mention it now, but it looks like Go ended up deprecating the whole Temporary method on net errors without any alternative (besides the original suggestion of manually comparing errors -- now via errors.Is -- to check against system specific errors, which aint great but is what it is)

// Deprecated: Temporary errors are not well-defined. // Most temporary errors are timeouts, and the few exceptions are surprising. // Do not use this method.

anacrolix commented 2 years ago

No worries, thanks for contributing. I'm tempted to swap what I put in for the patch you provided above, especially if your testing shows what I have isn't sufficient.

djdv commented 2 years ago

That might be the best for now. I'm no network expert, so I can't say for certain all of these errors can safely be ignored, but it's probably better to ignore them until someone says "hey that one shouldn't be ignored", rather than panicking. I say this because I got a panic almost immediately for WSAENETRESET (that's error 10052, not WSAECONNRESET/10054), which is conveniently not within the syscall package, but is in the windows pkg.

My log:

17:36:03 🏠 ⧽ bt-dht
2022-10-20T17:36:27-0400 NIL [main.loadTable:31]: loaded 0 nodes from table file
2022-10-20T17:36:27-0400 NIL [main.mainErr:72]: dht server on 192.168.1.40:40070, ID is 0f23c260e1c688e4d9b6a9e294d686b1fe36311a
panic: read udp 192.168.1.40:40070: wsarecvfrom: The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress.

goroutine 6 [running]:
github.com/anacrolix/dht/v2.(*Server).serveUntilClosed(0xc000188000)
        C:/Users/Dominic Della Valle/Projects/Go/src/github.com/anacrolix/dht/server.go:256 +0xdc
created by github.com/anacrolix/dht/v2.NewServer
        C:/Users/Dominic Della Valle/Projects/Go/src/github.com/anacrolix/dht/server.go:244 +0x79c
djdv commented 2 years ago

Err, just to be clear. The diff above should be changed to avoid calling Temporary, but that same list of error values should be checked instead of just the one being used right now.

Edit: keeping the build constraint like you have now is also better than doing it at runtime like in the quick hack I had.

anacrolix commented 2 years ago

Thanks, I'll make that change soon.

anacrolix commented 2 years ago

Update to v2.19.1.

djdv commented 2 years ago

Ran an instance for a few hours. Seems to be working alright. Untitled

anacrolix commented 2 years ago

@djdv thanks for the feedback and all the contribution you've made in this issue, it's really helped.