ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.55k stars 966 forks source link

Remove libp2p identify protocol #2008

Open djrtwo opened 4 years ago

djrtwo commented 4 years ago

From what I understand, we no explicit use for the libp2p identify protocol, but implementations generally have this running by default. We've talked about explicitly removing it in the past to avoid clients generally just leaking their client type (libp2p language pretty easily mapping to client type).

I suggest we add a SHOULD NOT support identify to our p2p spec

Additionally, I was told identify can serve as a dos vector. Looking for more info on this

wemeetagain commented 4 years ago

identify is still useful for preemptively determining a peer's supported protocols and receiving a SignedPeerRecord, which can be used as an additional form of discovery using gossipsubv1.1 peer exchange.

Since the agentVersion field is optional, maybe the language can be specific to that field. Eg something along the lines of: Clients SHOULD NOT send an agentField if they support identify.

djrtwo commented 4 years ago

I see. Do we use the preview of supported protocols in implementations today?

jrhea commented 4 years ago

Identify could potentially be a DoS vector as well. Using @protolambda's rumor tool i can create this script:

while true; do
host start
host listen --ip=127.0.0.1 --tcp=9001
peer connect /ip4/127.0.0.1/tcp/9000/p2p/16Uiu2HAmFNjrc8S1NBGQK2dogYMFNMyD37VrZu8s3J5ZiTKeykAH
peer info 16Uiu2HAmFNjrc8S1NBGQK2dogYMFNMyD37VrZu8s3J5ZiTKeykAH
kill 
done

to repeatedly hit an eth2 node with identify requests.

here is a screenshot of the output from rumor:

image

and a screenshot of lighthouse:

image

Perhaps some rate limit for incoming connections and requests coming from ip's needs to be implemented at the appropriate place in the stack

djrtwo commented 4 years ago

It we keep identify, I suppose that we should specify an agentVersion to include the client version as well. Maybe libp2p-agent-name/version/client-name/version

It looks like there isn't currently an application callback on identify so this information cannot be leveraged in the actual client. I suspect we could extend the identify spec to include a callback if we need to.

Question: do eth1 clients use the clientId in the Hello protocol for anything?

protolambda commented 4 years ago

It looks like there isn't currently an application callback on identify

The Go Host API hides the IDService getter, but it's there if you cast an interface to access it. And that can then be used to actively call for identification on a connection, and wait for a response. The default host implementation forces identify to be enabled, and run a check on every dial and stream open. Without an option to disable it (at least without forking all host code, like I did in Rumor). There was a comment somewhere in the default "basic"-host implementation about it being faster to have identify run first, than doing negotiation otherwise.

Once the identification completes, it sneakily puts the identified info (keys ProtocolVersion and AgentVersion) into the generic keystore part (/metadata/<base 32 peer id>/ prefix). The ProtocolVersion in go-libp2p is hardcoded as a constant to ipfs/0.1.0 and warns about breaking IPFS before 2020, and with legacy/todo comments. And while the spec specifies that different major/minor protocol versions are strictly unusable together, I can't find the place where this is enforced on a connection. The agent version seems to be unused.

If we go ahead with this, I wouldn't mind to discuss a libp2p-identify update that improves on this situation, to make it user-friendly before enshrining it more.

AgeManning commented 4 years ago

We dont use identify for anything other than debugging and identifying clients.

I had always intended on removing it before mainnet.

dapplion commented 10 months ago

AFAIK most clients expose their client name by default via the identify protocol. It's been very helpful to debug bugs while there's no evidence of attacks using it. Should we stick with it for now?

arnetheduck commented 10 months ago

A compromise we settled on in nimbus given the status quo was to publish only client name (and not version) - this at least prevents crawlers from attacking known-exploitable versions of clients.

The information value here is rather low, even for debugging - ie the fact that some particular client forwarded a message only tells a part of the story, and one that is not often that useful outside of showing fancy graphs of client diversity.

If it were gone, I wouldn't miss it (and nimbus would remove it) - removing it from only one client obviously makes no sense (which is why we send nimbus).

rolfyone commented 9 months ago

I don't mind having it report 'teku' as a general rule, but I'd be fairly strongly against enshrining anything that requires versions... Just having 'nimbus / lighthouse / teku / prysm' is fine imo, but if we wanted to remove it it's only probably a tiny change for us (grafana dashboard has the client breakdown for teku currently)

AgeManning commented 9 months ago

I couldn't see it in the specs. I've always assumed it we never required. Lighthouse has a CLI flag to just empty the identify agent string.

It's useful to collecting data on peers and doing analysis, which is kind of why we have it on by default. I'm not opposed to removing it, but just figured it has always been optional for each client team. We don't expect other clients to support identify.

dapplion commented 9 months ago

To resolve the issue we can either: