Specify Client Versions on Engine API

ethDreamer commented 5 months ago

By analyzing the structure of beacon blocks on the network, we are able to obtain fairly accurate data on consensus layer client diversity. Unfortunately, do to the fact that the overwhelming majority of validators use mev-boost, their execution clients do not leave any fingerprint behind in block proposals. Thus we are forced to rely on limited self-reporting data from staking pools. Many pools do not participate, and we often have outdated statistics for the pools that do. Worse yet, we have no data on client diversity for home stakers.

This PR can change that by allowing consensus clients to learn which execution client they are connected with.

Consensus clients can then embed this in their graffiti field by default when the user doesn't bother to set it. A quick survey of recent proposal graffiti reveals that:

already embed their client and version by default. It would be great to add the execution client to this. Perhaps prysm could be convinced to join as well.

An analysis of ~2000 recent blocks indicated that nearly half of all validators don't bother to change their graffiti from the default so the potential to gather data here is huge.

garyschulte commented 5 months ago

Adding this to the engine api is definitely an improvement. In order to fit within the graffiti, we should specify a field size limit or a strategy to encode the version info.

Otherwise we might end up with responses like: Lighthouse/v4.5.0-441fc16 :besu/v24.1.2-dev-8407b9e7/linux-x86_64/openjdk-java-21

fjl commented 5 months ago

I'm personally more in favor of standardizing web3_clientVersion, because it already exists in clients.

ethDreamer commented 5 months ago

In light of comments received so far I've pushed an alternative specification called engine_clientVersionV1 which is more comprehensive. There are a couple things to be decided:

Do we reuse web3_clientVersion or choose the more comprehensive engine_clientVersionV1?
Should we require this method be supported instead of recommending its support? (SHOULD vs MUST)
Do we agree on the abbreviations for the ClientCode?
Do we accept using the first 4 bytes of the commit hash as a short-hand for version?

Personally I lean towards taking engine_clientVersionV1 and making it mandatory if there aren't objections. If we're not going to take the easy route and just reuse web3_clientVersion then we might as well make a method that accomplishes what we're really trying to do here (get better measurements of EL client diversity). To that end, we require a standardized shorthand for specifying both clients in the limited space of block graffiti (32 bytes).

The definitions of ClientCode specified here allow us to standardize how we encode any client pairing and specify both execution and consensus client versions inside the block graffiti within just 20 bytes. For example:

LH1be52536BU0f91a674

If desired, the space could be further reduced so that the bytes of the commit hash are embedded directly into the graffiti bytes (allowing the full consensus and execution client versions to be specified in 12 bytes).

Standardizing the version specifications this way makes graffiti analysis easy regardless of what client pairs are used. Based on my testing, geth, nethermind, and besu already have the commit hash embedded in their binaries when they list the client version so it shouldn't be difficult for them to build it in here. I can't speak for the other clients though.

Side note: by design, none of the proposed client codes are valid hex so they they won't be confused with the commit hash.

rkrasiuk commented 5 months ago

Very supportive of this change. Agree with previous comments around naming. Unsure if we should introduce versioning for identification. we already have unversioned engine_exchangeCapabilities, might as well drop v1 and simply have engine_clientVersion

ethDreamer commented 5 months ago

Unsure if we should introduce versioning for identification. we already have unversioned engine_exchangeCapabilities, might as well drop v1 and simply have engine_clientVersion

I believe engine_exchangeCapabilities is the only unversioned method because we won't allow it to ever change. If execution clients began supporting a new version of exchangeCapabilities, then consensus clients would have to do trial and error to determine which version to call, which defeats the purpose of having a "what methods do you support" method.

This doesn't necessarily mean that we couldn't also agree to never allow the engine_clientVersion method to change.

StefanBratanov commented 5 months ago

Supportive of this change. Maybe the method could be renamed to engine_exchangeClientVersion similar to engine_exchangeCapabilities since we are essentially exchanging the versions.

lightclient commented 5 months ago

Is there an advantage of being prescriptive about the format the client returns it's version in? I was imagining something much closer to web3_clientVersion where any value would be accepted by CL and incorporated into the graffiti.

I guess there isn't much downside as it is mainly just upfront cost of spec'ing things out.

garyschulte commented 5 months ago

Is there an advantage of being prescriptive about the format the client returns it's version in? I was imagining something much closer to web3_clientVersion where any value would be accepted by CL and incorporated into the graffiti.

Having a conformed format will only help in identification, and in 'economy of graffiti'. Ensuring there is a predictable portion of graffiti consumed by client identification makes it more palatable IMO. The primary downside is just the gatekeeping required to maintain the list.

I like the human readable and more verbose bits, but I think that might only be useful in CL logs, since it is behind JWT secured endpoint.

rolfyone commented 5 months ago

It seems like the other missing part that would be nice in graffiti would be the builder and version if used (mostly used, but not always) The encoding makes sense to fit inside graffiti bytes. not sure what % of blocks use default graffiti and whether this will be useful ultimately...

michaelsproul commented 5 months ago

@rolfyone We already have data on the builders because they fill the execution payload's extraData field. E.g. this block has rsync-builder.xyz in the extra data: https://beaconcha.in/slot/8312808. The relays also provide data APIs that let us map blocks & builders to relays. Often there are multiple relays that will produce/publish each builder payload. Some of these affinities are displayed on sites like https://mevboost.pics/.

The only thing the local BN could read would be a list of relays from mev-boost or its own config. This would be 1) too long to include in graffiti and 2) redundant, given the above.

ethDreamer commented 5 months ago

Okay I just want to take the temperature of the room. Please react to this to vote:

:heart: - vote for reusing web3_clientVersion :tada: - vote for engine_clientVersionV1 :+1: - vote for adopting engine_clientVersionV1 but renaming to engine_exchangeClientVersionV1

lightclient commented 5 months ago

kasey commented 5 months ago

Prysm supports this proposal.

I filed this issue to start recording our user-agent info in graffiti by default: https://github.com/prysmaticlabs/prysm/issues/13558

ethDreamer commented 5 months ago

Thus far all votes are for adopting the new method so I've removed references to reusing web3_clientVersion

mkalinin commented 5 months ago

LOOOL! Exchange is when semantically the same entity is sent in both directions.

mkalinin commented 5 months ago

I am sorry to be late to this discussion, looks like @lightclient’s meme forced me to become a part of it.

If we’re deciding about a client version as a part of Engine API spec then it should be engine_getClientVersion. If we want web3_clientVersion to be exposed on the Engine API endpoint it is also fine and we can do it by simply adding this method to the list of those that are exposed (we have it in the Engine API spec too), then no need to spec out this method as a part of Engine API.

siladu commented 5 months ago

Is there an advantage of being prescriptive about the format the client returns it's version in? I was imagining something much closer to web3_clientVersion where any value would be accepted by CL and incorporated into the graffiti.

I'm weakly in favour of this way rather than have the graffiti require an extra step to convert a code. Also, would rather not have an enum to maintain as clients come and go.

However, if the code is decided upon, then can it be standardized? Seems a bit random at the moment, e.g. Besu = BU Teku = TK Grandine = GR

ethDreamer commented 5 months ago

I'm weakly in favour of this way rather than have the graffiti require an extra step to convert a code.

There is no conversion of a code - the code itself will be embedded in the graffiti. That's the whole point of the code (specifying a client in limited space). The point of the code is to ensure that lighthouse,teku,prysm,nimbus, lodestar or any other consensus client will all indicate a connection with a teku client by specifying TK in their graffiti. If we don't standardize the format for how the clients are embedded in the graffiti, then every client combination will be different, making analysis much more difficult. Another advantage of the ClientCode is that it leaves the users a lot of remaining space that they can use to do what they wish (I've already gotten a request for this) which should help incentivize them to participate.

Also, would rather not have an enum to maintain as clients come and go.

I get what you're saying.. but we need to standardize how clients will be indicated in graffiti somehow. This system is low maintenance, only requiring adding a new entry when a new client team arrives, and accommodating any two-letter code even before it is reserved.

However, if the code is decided upon, then can it be standardized? Seems a bit random at the moment

I left it to the client teams to choose their own code. I know besu's is different than teku's but I thought besu wouldn't like the code BS lol. But again, that's up to besu. If people want to standardize some method of picking out letters that's fine but I wouldn't want to hold up this PR hashing out such things.

ethDreamer commented 5 months ago

NEW POLL (here you can vote for more than one option)

Not sure if this was clear, but I am also proposing a loose standard for specifying client versions in graffiti. There are several different options that trade off collision resistance, ASCII readability, and space (leaving users the rest of the space may convince more to join in the graffiti).

Option 1 [vote with :+1:] - Embed full 4 bytes of the commit hash in ASCII LH1be52536BU0f91a674 (hex: 0x4C48316265353235333642553066393161363734) - leaves users 12 characters

Option 2 [vote with :smile:] - Encode full 4 bytes of commit hash in hex LH�å%6BU��¦t (hex: 0x4c481be5253642550f91a674) - leaves the user 20 characters

Option 3 [vote with :tada:] - Encode only the first 3 bytes of the commit hash in hex LH�å%BU��¦ (hex: 0x4c481be52542550f91a6) - leaves the user 22 characters

Option 4 [vote with :heart:] - Encode only the first 2 bytes of the commit hash in hex LH�åBU�� (hex: 0x4c481be542550f91) - leaves the user 24 characters

Option 5 [vote with 🚀 ] - Encode only the first 2 bytes of the commit hash in ASCII LH1be5BU0f91 (hex: 0x4C4831626535425530663931) - leave the user 20 characters

If you propose a different standard, please specify :)

dapplion commented 5 months ago

This format captures the majority of the graffiti space.

Instead, using only 2 bytes per commit and raw encode leaves 24 characters back to the user. Such a tinny thing is an easier sell to a wider range of stakers.

Only 2 bytes?

Users don't usually build client binaries of random commits of the codebase, instead, there are only a few commits associated with the official releases teams do. Any commit for a release of the prior hard forks is non-viable. So a very prolific team that ships monthly and a year hard-fork schedule that leaves us with 12 possible commits in a 16-bit space. In the unlikely case of collision, we can probably figure out the version from the commit of the other client.

Raw encode

Taking the example above, let's encode the commit as

hex: LHa674BU1be5
raw: LH¦tBUå

None is particularly human-friendly, which is not the point of the tag. A developer can quickly map the raw to hex; client diversity crawlers don't care. The chance of the entire tag having invisible ASCII characters is 0.3% low enough IMO.

michaelsproul commented 5 months ago

What about 1 byte client identifiers (we have 26 letters 👀) and 2-bytes of ASCII commit hash? e.g.

L045bB0f54

Length is 10 bytes leaving 22 for the user, and it's human readable + won't destroy renderers/terminals where it is displayed.

Rocket Pool already uses single character identifiers namespaced by consensus and execution (e.g. N is Nimbus and Nethermind). We could use the same, or use unique letters which allow for future clients that are not strictly consensus or execution (or both). I think I prefer the namespacing though

rolfyone commented 5 months ago

Length is 10 bytes leaving 22 for the user, and it's human readable + won't destroy renderers/terminals where it is displayed.

I'm happy either way, the first client identifier could just be any character, but I agree sticking to the first letter gives 52 for CL and 52 for EL, and could expand to use the whole 96 printables not just the letters if we had to... or just having a new field in a hard fork one day...

I'm generally in favour of human readable just for simplicity, it'd be nice to standardise on position in graffiti, so if a user does use some of the graffiti space we know where their bits are consistently.... So if i have a 12 character string for my custom graffiti, does the client embed before that or after... I don't mind start or end, as long as we don't have some going to start and some going to end.. This would allow visualisers to just trim that data if its present easily.

Given the 32 bytes is fixed, could potentially stick it in the last 10 bytes if available, but I'm not strongly opinionated whether it should be start or end.

smartprogrammer93 commented 5 months ago

I do think this is a very needed change. Additionally, The nethermind research team has been awarded a grant by the ESP program to study the collection of client diversity data. the following is a link to our preliminary ideas https://ethresear.ch/t/allowing-validators-to-provide-client-information-privately-a-project-by-nethermind-research-and-nethermind-core/18527 With the above in mind, i think we should consider an approach where this APi takes the usage of multiplexers like https://github.com/TennisBowling/executionbackup into consideration. Maybe this method should return a list of (client, version, and commit) instead of just a single one

ethDreamer commented 5 months ago

If we were going to do 1-byte identifiers then I might propose the client codes be:

Execution:

B: besu
J: ethereumJS
E: erigon
G: go-ethereum
N: nethermind
R: reth

Consensus:

G: grandine
L: lighthouse
S: lodestar
N: nimbus
T: teku
P: prysm

I suppose we could do this, but it already gets messy with several clients sharing the same first letter (especially lighthouse/lodestar & erigon/ethereumJS). The end result isn't much better (only saving 2 characters). I also find 2 character codes to be more readable. Additionally, the two byte identifiers allow us to adopt a flexible standard where the amount of space for the version bytes could be adjusted depending on what the user chose as their graffiti. For example:

user graffiti takes up 0 characters: LH1be52536BU0f91a674 user graffiti takes up 20 characters: LH1be5BU0f91 user graffiti takes up 24 characters: LH1bBU0f user graffiti takes up 28 characters: LHBU

I think this flexible standard achieves all 3 goals (human readable, max space for the user, collision resistance). But anyone else is welcome to weigh in.

ethDreamer commented 5 months ago

With the above in mind, i think we should consider an approach where this API takes the usage of multiplexers like https://github.com/TennisBowling/executionbackup into consideration. Maybe this method should return a list of (client, version, and commit) instead of just a single one

The purpose of this proposal is to obtain more accurate information on stake-weighted EL client diversity. We want this information so that we can understand the network's exposure to certain risks (e.g. consensus bug in supermajority client).

When a user is staking with a multiplexer or a DVT, the effect that their stake has on the network's exposure to these risks is entirely dependent on the implementation details of their multiplexer. What does their multiplexer do if clients disagree? Does it just take geth? Then they're running geth. Does it take 2/3? Well then it's more like running a minority client but it really depends on the details of the bug.

Either way, I see no realistic/reasonable way to communicate these details in the graffiti. Thus, the only real usage these kinds of staking setups can have for this method is mostly in the human readable fields which can be used for logging / debugging. The current standard allows multiplexers to concatenate all the clients in these fields if they wish.

What's more, the overwhelming majority of validators do not run these kinds of setups. Should these setups become more popular in the future, then we would probably have better luck probing this data via the other methods you're currently researching.

But IMO, I don't think it's worth the added complexity to design the V1 version of this method around these novel setups when their configuration already negates this method's primary purpose.

michaelsproul commented 5 months ago

I'm happy with 2 byte client IDs, and think the advantage of choosing something outweighs the delay from bikeshedding. I've switched my vote to option 5 but will also accept any of the options with rough consensus. Let's get this done.

ethDreamer commented 5 months ago

I'm happy with 2 byte client IDs, and think the advantage of choosing something outweighs the delay from bikeshedding. I've switched my vote to option 5 but will also accept any of the options with rough consensus. Let's get this done.

💯 agree with this. Option 5:

LH1be5BU0f91

is a great balance of the various concerns and if we find that users want more characters, we can adopt the flexible standard which is compatible with option 5.

smartprogrammer93 commented 5 months ago

This PR specifies the exchange of information about cl and el client on engine API, i think it's best we dont mix this with how to represent this data in the graffiti field. That can go into consensus spec, maybe? I dont think a returned list is added complexity. It is just forward-looking. Regardless of how the multiplexers are implemented( you check the linked implementation), it is still important to represent them

ethDreamer commented 5 months ago

i think it's best we dont mix this with how to represent this data in the graffiti field. That can go into consensus spec, maybe?

This PR does not enshrine a standard for representing data in graffiti. But we are having a discussion on the standard here because it informs how we design this API (the client codes for example)

I dont think a returned list is added complexity. It is just forward-looking. Regardless of how the multiplexers are implemented( you check the linked implementation), it is still important to represent them

None of the other engine-api methods are designed to accommodate multiplexers. Instead, we place the responsibility for handling multiple messages onto the multiplexer itself so that it appears as a single client to the consensus layer. I don't understand why this method alone must be designed to accommodate this when we don't have a reason for the consensus layer to gather such information. What is the CL supposed to do with multiple client codes or multiple commit prefixes when the purpose of these fields is to encode such data in graffiti? In the absence of such a purpose, I'm in favor of applying this principle but I'm open to discussing more / taking a vote.

smartprogrammer93 commented 5 months ago

What is the CL supposed to do with multiple client codes or multiple commit prefixes when the purpose of these fields is to encode such data in graffiti? In the absence of such a purpose, I'm in favor of applying this principle but I'm open to taking a vote.

Given the research i linked as well, posting such data in the graffiti field could not be the final approach CLs will have to adopt. Such information will be relevant then. Im down to be convinced otherwise as well. But, in the case that this lands how it is, there is a high probability we will have to have V2 of the same method with reporting multiple ELs depending on the research findings.

philknows commented 5 months ago

I'm happy with 2 byte client IDs, and think the advantage of choosing something outweighs the delay from bikeshedding. I've switched my vote to option 5 but will also accept any of the options with rough consensus. Let's get this done.

This is very similar to how Rocketpool identifies their client combinations using their Smart Node software and has worked well to help identify their client combos for troubleshooting/incentives/etc.

tbenr commented 5 months ago

I'm happy with 2 byte client IDs, and think the advantage of choosing something outweighs the delay from bikeshedding. I've switched my vote to option 5 but will also accept any of the options with rough consensus. Let's get this done.

💯 agree with this. Option 5:
LH1be5BU0f91
is a great balance of the various concerns and if we find that users want more characters, we can adopt the flexible standard which is compatible with option 5.

How much do we want to enforce this?

If user specifies a graffiti taking all 32 bytes, client shouldn't start anymore? Should we give an opt-out option to regain full 32 bytes?

I have mixed feelings...

ethDreamer commented 5 months ago

How much do we want to enforce this?

If user specifies a graffiti taking all 32 bytes, client shouldn't start anymore? Should we give an opt-out option to regain full 32 bytes?

This is absolutely not enforced. Most of the data will come from people who don't bother to set their graffiti. Users will always have the ability to choose whatever they want for their graffiti or to not provide this data at all. But some will want to set custom graffiti while also providing data for client diversity. For those users, we want to give them more bytes so they are more likely to participate.

tbenr commented 5 months ago

How much do we want to enforce this?

If user specifies a graffiti taking all 32 bytes, client shouldn't start anymore? Should we give an opt-out option to regain full 32 bytes?

This is absolutely not enforced. Most of the data will come from people who don't bother to set their graffiti. Users will always have the ability to choose whatever they want for their graffiti or to not provide this data at all. But some will want to set custom graffiti while also providing data for client diversity. For those users, we want to give them more bytes so they are more likely to participate.

Ok that's good. If we sacrifice human readibility we could have the first byte representing clients (4bit cl, 4bit el) + 4bytes (2 cl, 2 el) for commits, so we end up with 5 versioning + 27 user msg

ethDreamer commented 5 months ago

If we sacrifice human readibility we could have the first byte representing clients (4bit cl, 4bit el) + 4bytes (2 cl, 2 el) for commits, so we end up with 5 versioning + 27 user msg

That's a nice option if we want to always provide version information while taking up minimal space. But it does limit us to only 16 execution / consensus clients, and based on my conversations a lot of people seem to prefer readability.

In practice, the versioning information is really just nice to have for some debugging cases. But it is not strictly necessary. It is much less important than knowing the implementation itself for the purposes of measuring EL client diversity. That's why I like the flexible standard because it allows users 28 characters if they really want it, while preserving readability, without limiting EL/CL client implementations, and it preserves the version information for debugging in the vast majority of cases.

ethDreamer commented 5 months ago

I've renamed the method to getClientVersionV1 in accordance with @mkalinin's suggestion. I've also decided to accommodate multiplexers by returning an array of ClientVersionV1 objects for two reasons:

We require some way to indicate to the consensus client that a multiplexer is being used so that the data can be excluded from graffiti. So we may as well indicate this by receiving more than one ClientVersionV1 object
Knowing each client version in a multiplexer scenario may be useful in the future for other methods of measuring client diversity which do not use graffiti.

ethDreamer commented 5 months ago

LAST CALL FOR CONCERNS

I believe we've addressed all concerns that have been raised at this point. The core idea of this PR has widespread support and most disagreement is minor & related to small implementation details. There's no reason to spend weeks bikeshedding about this optional feature.

CURRENT PLAN

wait a few more days for small fixes
merge if there are no major objections
see if issues arise during implementation and make changes as needed
after widespread implementation, consider making this method mandatory

If you agree with this plan, please give a :+1:, otherwise comment your objection

lightclient commented 5 months ago

Please add spell check errors to wordlist.txt.

kasey commented 5 months ago

How much do we want to enforce this?

If user specifies a graffiti taking all 32 bytes, client shouldn't start anymore? Should we give an opt-out option to regain full 32 bytes?

This is absolutely not enforced. Most of the data will come from people who don't bother to set their graffiti. Users will always have the ability to choose whatever they want for their graffiti or to not provide this data at all. But some will want to set custom graffiti while also providing data for client diversity. For those users, we want to give them more bytes so they are more likely to participate.

How does everyone plan to represent this in flags? Are we going to artificially limit the size of user-specified graffiti flags? Otherwise, what would users expect the behavior to be if they specify 20-32 bytes of graffiti"? For instance do we truncate the version string from right to left? Doing that would assume precedence in importance of the information: CL impl > CL git hash > EL impl > EL git hash. It would also be hard for software parsing this field to differentiate user-specified graffiti that flows into this section and looks like an ident/hash from the real thing.

If this is required then IMO we should limit the size of user-provided graffiti to 20 bytes and be very explicit about the fact that 1) this will be a breaking config change for users and 2) there is no opting out by "overriding" the default with a flag.

lightclient commented 5 months ago

One thing you could do is continue dropping client version data up until you can no longer store the EL+CL code combo. I think version / commit is nice to have but as critical.

kasey commented 5 months ago

One thing you could do is continue dropping client version data up until you can no longer store the EL+CL code combo. I think version / commit is nice to have but as critical.

Yeah the EL is most important (hardest to determine through other means), then CL, then versions. So if the plan is to truncate, I think we should just order it that way: (EL|CL|el-hash|cl-hash). You could get fancy and interleave the EL/CL hash bytes one-by-one but maybe that's overkill :)

ethDreamer commented 5 months ago

One thing you could do is continue dropping client version data up until you can no longer store the EL+CL code combo. I think version / commit is nice to have but as critical.

Exactly. This is what I've been referring to as a flexible standard. The version information is nice to have but not critical.

rolfyone commented 5 months ago

I'm not sure I'd interleave, but shortening does make sense. If the version of a client is present, it being in the same logical chunk will be easier to search for. eg. searching for LH1b from the example below..

user graffiti takes up 0 characters: LH1be52536BU0f91a674 user graffiti takes up 20 characters: LH1be5BU0f91 user graffiti takes up 24 characters: LH1bBU0f user graffiti takes up 28 characters: LHBU

I think this flexible standard achieves all 3 goals (human readable, max space for the user, collision resistance). But anyone else is welcome to weigh in.

I think this is a sensible approach, and if the user graffiti is beyond 28 characters just not having the data...

Do we know what percentage of blocks have more than 28 bytes of graffiti? seems like we could get a fairly good estimation of how useful this would be...

ethDreamer commented 5 months ago

I've compiled the discussion around choosing a graffiti standard into a single document:

https://hackmd.io/@wmoBhF17RAOH2NZ5bNXJVg/BJX2c9gja

I welcome any comments. Also, there haven't been any requests for changes in days. Seems like we can merge this?

ethDreamer commented 5 months ago

@lightclient it's been about a week since people agreed we should merge this and there haven't been any objections. Is that enough time to merge?

ethereum / execution-apis

Specify Client Versions on Engine API #517

Only 2 bytes?

Raw encode