ethstorage / es-node

Golang implementation of the EthStorage node.
Other
1.15k stars 77 forks source link

fix issue 262: close peers that have no addresses in the host.peerstore due to expired ttl. #288

Closed ping-ke closed 4 months ago

ping-ke commented 4 months ago

issue 262: https://github.com/ethstorage/es-node/issues/262

Root Cause When creating a newStream to send a request to a peer, if the connection does not exist, it will call the peer again. But if the address of that peer does not exist, this issue will be thrown. The peer address does not exist because of time-to-live timeout, which is 24 hours correctly. time-to-live timeout means the peer not be added back to peerstore by discovery. 1716739858495

pstore.AddAddrs(info.ID, info.Addrs, discoveredAddrTTL) // discoveredAddrTTL = time.Hour * 24

1716738775391 1716738780004 1716732239701 1716732554046

How to fix it Change the log level to debug for the error caused by newStream "no addresses" and close the peer.

qzhodl commented 4 months ago

the peer will remove or recover after some time

Why is the peer removed or recovered after some time? Could you please provide more explanation?

qzhodl commented 4 months ago

I am still a little concerned because we haven’t identified the root cause of addresses disappearing in the peer store. If we change this error to the debug level, we will essentially lose visibility of the issue.

qzhodl commented 4 months ago

Change the log level to debug for the error caused by newStream, this should be a temporary issue between the local node and the remote peer, I have seen this issue twice (one reported by a customer and one by me - the network condition at that time was very bad.)

This issue will be resolved some times later when

  1. When the peer is removed
  2. Addr is added back to Peerstore in the discovery.go

According to the discussion in today’s meeting, I think we may need to set up a test environment to determine if the root cause is an outdated TTL.

ping-ke commented 4 months ago

Added log to removeExpired in addr_book.go, and will see it clear the addresses after expired. and connect to this peer will throw "no addresses" issue.

1716953640786

qzhodl commented 4 months ago

Added log to removeExpired in addr_book.go, and will see it clear the addresses after expired. and connect to this peer will throw "no addresses" issue.

1716953640786

Then should we also need to remove the peer in syncClient when the peerstore removes the addr?

qzhodl commented 4 months ago

Need to update the PR title & resolve the conflicts