Open djrtwo opened 4 years ago
identify
is still useful for preemptively determining a peer's supported protocols and receiving a SignedPeerRecord, which can be used as an additional form of discovery using gossipsubv1.1 peer exchange.
Since the agentVersion field is optional, maybe the language can be specific to that field. Eg something along the lines of: Clients SHOULD NOT
send an agentField
if they support identify
.
I see. Do we use the preview of supported protocols in implementations today?
Identify could potentially be a DoS vector as well. Using @protolambda's rumor tool i can create this script:
while true; do
host start
host listen --ip=127.0.0.1 --tcp=9001
peer connect /ip4/127.0.0.1/tcp/9000/p2p/16Uiu2HAmFNjrc8S1NBGQK2dogYMFNMyD37VrZu8s3J5ZiTKeykAH
peer info 16Uiu2HAmFNjrc8S1NBGQK2dogYMFNMyD37VrZu8s3J5ZiTKeykAH
kill
done
to repeatedly hit an eth2 node with identify requests.
here is a screenshot of the output from rumor:
and a screenshot of lighthouse:
Perhaps some rate limit for incoming connections and requests coming from ip's needs to be implemented at the appropriate place in the stack
It we keep identify
, I suppose that we should specify an agentVersion to include the client version as well. Maybe libp2p-agent-name/version/client-name/version
It looks like there isn't currently an application callback on identify
so this information cannot be leveraged in the actual client. I suspect we could extend the identify
spec to include a callback if we need to.
Question: do eth1 clients use the clientId
in the Hello
protocol for anything?
It looks like there isn't currently an application callback on identify
The Go Host API hides the IDService getter, but it's there if you cast an interface to access it. And that can then be used to actively call for identification on a connection, and wait for a response. The default host implementation forces identify to be enabled, and run a check on every dial and stream open. Without an option to disable it (at least without forking all host code, like I did in Rumor). There was a comment somewhere in the default "basic"-host implementation about it being faster to have identify run first, than doing negotiation otherwise.
Once the identification completes, it sneakily puts the identified info (keys ProtocolVersion
and AgentVersion
) into the generic keystore part (/metadata/<base 32 peer id>/
prefix). The ProtocolVersion
in go-libp2p is hardcoded as a constant to ipfs/0.1.0
and warns about breaking IPFS before 2020, and with legacy/todo comments. And while the spec specifies that different major/minor protocol versions are strictly unusable together, I can't find the place where this is enforced on a connection. The agent version seems to be unused.
If we go ahead with this, I wouldn't mind to discuss a libp2p-identify update that improves on this situation, to make it user-friendly before enshrining it more.
We dont use identify for anything other than debugging and identifying clients.
I had always intended on removing it before mainnet.
AFAIK most clients expose their client name by default via the identify protocol. It's been very helpful to debug bugs while there's no evidence of attacks using it. Should we stick with it for now?
A compromise we settled on in nimbus given the status quo was to publish only client name (and not version) - this at least prevents crawlers from attacking known-exploitable versions of clients.
The information value here is rather low, even for debugging - ie the fact that some particular client forwarded a message only tells a part of the story, and one that is not often that useful outside of showing fancy graphs of client diversity.
If it were gone, I wouldn't miss it (and nimbus would remove it) - removing it from only one client obviously makes no sense (which is why we send nimbus
).
I don't mind having it report 'teku' as a general rule, but I'd be fairly strongly against enshrining anything that requires versions... Just having 'nimbus / lighthouse / teku / prysm' is fine imo, but if we wanted to remove it it's only probably a tiny change for us (grafana dashboard has the client breakdown for teku currently)
I couldn't see it in the specs. I've always assumed it we never required. Lighthouse has a CLI flag to just empty the identify agent string.
It's useful to collecting data on peers and doing analysis, which is kind of why we have it on by default. I'm not opposed to removing it, but just figured it has always been optional for each client team. We don't expect other clients to support identify.
To resolve the issue we can either:
From what I understand, we no explicit use for the libp2p
identify
protocol, but implementations generally have this running by default. We've talked about explicitly removing it in the past to avoid clients generally just leaking their client type (libp2p language pretty easily mapping to client type).I suggest we add a
SHOULD NOT
supportidentify
to our p2p specAdditionally, I was told
identify
can serve as a dos vector. Looking for more info on this