ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.54k stars 20.13k forks source link

p2p/discover: UDP listener port not released when macOS firewall is enabled #18443

Open ryanberckmans opened 5 years ago

ryanberckmans commented 5 years ago

System information

Geth
Version: 1.8.20-stable
Git Commit: 24d727b6d6e2c0cde222fa12155c4a6db5caaf2e
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.11.2
Operating System: darwin (OSX 10.13.6)
GOPATH=/Users/me/go
GOROOT=/Users/travis/.gimme/versions/go1.11.2.darwin.amd64

Expected behaviour

Discovery UDP listener should close socket on shutdown/interrupt in all cases.

Actual behaviour

In certain code paths, the discovery UDP listener is not closed on shutdown/interrupt, preventing geth from restarting until the port is manually released or system restarted.

I hit one of these code paths but don't have a specific repro.

Invocation that produced dangling UDP listener (light node):

geth --syncmode=light --cache=512 --rpc --ws --wsorigins=127.0.0.1,http://127.0.0.1:8080,https://127.0.0.1:8443 --datadir=redact

Listener initialization which became dangling:

  [14:57:00.418] [info] GETH NODE: INFO [01-14|14:57:00.418] UDP listener up                          net=enode:/redact@[::]:30303

Interrupt which failed to close UDP listener:

  [14:57:30.059] [info] GETH NODE: INFO [01-14|14:57:30.004] Got interrupt, shutting down...
  INFO [01-14|14:57:30.004] WebSocket endpoint closed                url=ws://127.0.0.1:8546
  INFO [01-14|14:57:30.005] HTTP endpoint closed                     url=http://127.0.0.1:8545
  INFO [01-14|14:57:30.005] IPC endpoint closed                      url="/Users/me/Library/Application Support/augur/geth/geth.ipc"
  INFO [01-14|14:57:30.005] Blockchain manager stopped
  INFO [01-14|14:57:30.005] Stopping light Ethereum protocol
  INFO [01-14|14:57:30.007] Light Ethereum protocol stopped
  INFO [01-14|14:57:30.008] Transaction pool stopped

Fatal when attempting to restart geth:

Fatal: Error starting protocol stack: listen udp [::]:30303: bind: address already in use

Util showing port not released:

$ netstat -anv | grep "30303|pid"
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)     rhiwat shiwat    pid   epid
udp46  58303      0  *.30303                *.*                                196724   9216  45852      0

Confirm pid 45852 doesn't exist (ie. port is unreleased after process killed; not unkilled/zombie process)

$ ps -e | grep 45852
// empty
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

chrisfranko commented 4 years ago

Im having the same issue.

System

Geth
Version: 1.9.10-stable
Git Commit: 58cf5686eab9019cc01e202e846a6bbc70a3301d
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.13.7
Operating System: darwin (OSX 10.15.2)

I built with

make geth

Ran geth with

build/bin/geth console

typed exit to close geth

exit

waited a few seconds to restart the node and got

Fatal: Error starting protocol stack: listen udp [::]:30303: bind: address already in use

Process doesn't appear anywhere. I can force it to start by either changing the --port flag or restarting my machine.

holiman commented 4 years ago

First reporter:

Operating System: darwin (OSX 10.13.6)

Second reporter:

Operating System: darwin (OSX 10.15.2)

Might be something with OSX?

renaynay commented 4 years ago

I could not reproduce the error with either of the situations documented on this issue.

My system info:

    Geth
    Version: 1.9.14-unstable
    Git Commit: 3bf1054a13f2ed2ba8c0c7c44279bbca6e4e7cbb
    Git Commit Date: 20200416
    Architecture: amd64
    Protocol Versions: [65 64 63]
    Go Version: go1.14.1
    Operating System: darwin (OSX 10.15.4)
    GOPATH=
    GOROOT=/usr/local/Cellar/go/1.14.1/libexec
fjl commented 4 years ago

This happens when the macOS firewall is enabled. We cannot fix this issue, but we could work around it by using a random, OS-assigned port by default.

capcasady commented 3 months ago

I have worked around this this way which is admittedly nuts. Remove the ethernet cable, wait for sockets to drain. shutdown. I have never had this fail although I may just have been lucky. This a decades old bug in OSX. My theory is an apparently closed udp socket with data waiting to be read and the firewall is in use doesn't always get cleaned up. Never found a way to free the socket but I imagine a source code guru could use a debugger to clear the network stack that has that data, maybe without a crash.