Closed veox closed 5 years ago
This might actually be two issues collated:
dyndial
failed, and subsequently trying to re-dial other-genesis peers as often as same-genesis ones;dyndial
"light" peers in the first place (on my lightserv
, near-100% of les
peers are inbound
).cc @zsfelfoldi
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
In trinity
, a peer database is maintained, and past failures - say, "genesis mismatch" - are recorded. Past failures add a "timer" to skip re-dialing that particular peer; the nature of the failure determines the timer's duration. Wrong network/failure have the timer set to one day.
I am looking at it now. The "server pool" (the peer selection logic of clients) is indeed a bit dumb, one of its problems is that currently it does not distinguish between reasons of failure like "no free slots at the moment" and permanent incompatibility.
This issue seems to be primarily an issue with the (soon to be replaced) experimental discv5 discovery. Sometimes somehow discv4 and discv5 peers get mixed up so ETH nodes sometimes try to dial random LES nodes too in the hope of establishing an ETH connection. This will be fixed by the new discv5 and I don't think it is a serious issue in itself. The fact that these peers are repeatedly hammered is bad but it is a general p2p issue unrelated to LES.
Based on the logs, these seem to originate from a full node dialing a light client (or rather dialing a random node and matching the les protocol capability). To fix this, we need to make the discovery protocol a bit smarter, which we're currently doing with ENRs. It's a known issue (same happens during finding testnet peers for example). There's no inherent issue, just a protocol limitation. Will be fixed soon-ish.
System information
Geth version:
1.8.4-stable
(installed via Ubuntu PPA package) OS & Version: Ubuntu 16.04.4 LTS (Xenial Xerus)Expected behaviour
On LES handshake failure, especially due to genesis block mismatch, no repeat dial should be attempted, for a prolonged period of time.
Actual behaviour
A
dyndial
to the sameincompatibleother-genesis peers is attempted repeatedly.EDIT: To clarify, IMO peer exchange with other ether-networks is nice, and might come in handy when sharding; but hammering known-different peers with connections is not.
Steps to reproduce the behaviour
geth
run as:Enable
les
debug messages inconsole
:Observe log.
Backtrace
Is this necessary to demonstrate this particular issue?..
Here's the relevant log output:
(It's the same three peers.)