Add validators v2. - Githubissues

mcdee commented 3 months ago

This PR adds a version 2 of the validators API.

The requirement comes from the number of entries in the Validators array in state on mainnet. At time of writing there are over 1.4MM entries, and no plans to reduce this number (there are some options that could reduce the number of active validators, but this doesn't help the array itself). The JSON returned by v1 of this response is over 600MB in size, which has knock-on effects for both speed of delivery (it takes approximately 5s to send this much data over a 1GB link) and decoding time. As such, a version of this endpoint that supports SSZ would be useful.

The changes in this endpoint from v1 are:

allow SSZ as a return encoding type
remove the status item from the per-validator data
provide the consensus version in the header and JSON metadata

status is a text string in the JSON response for v1 of this call, so does not have a canonical mapping to SSZ. Instead of removing it, other options are to define a mapping similar to an enum, or to provide it as a byte array. Both come with added complexity or increased data, and the information that it provides can be calculated from the data that is already provided in the validator struct (plus an epoch).

Some open questions around this endpoint:

should we add an epoch metadata field in the returned data to allow recreation of validator states for when the state identifier used as part of this call is non-numeric (head, finalized etc.)?
should we remove the balance value for each validator? This value is available from the validator_balances endpoint, does it make sense to still duplicate it here?
could we realistically remove the index value for each validator? The big benefit of doing this would be that the data returned could be a simple array of Validator objects, which would remove the need for custom structures when decoding SSZ (and, indeed, JSON). However, there is a significant downside here in that the index information would no longer be easy to obtain. Countering that, should we be promoting the use of indices anywhere in the API when it's possible that index reuse will become a thing?
if we do not return status, should we still allow filtering the requested values by status?

rolfyone commented 3 months ago

Ive never really been convinced about the total balance being in this endpoint, so i'd be happy to see it gone.
the overall approach of just having the validator object returned i do like.
allowing SSZ is a great improvement.
could potentially allow an interface to query pubkey and get back a validator index if we wanted to remove the index here... I'm not sure how heavily the flow of looking up /validators just because you need indices is...
i get the impression filtering by status may be used... I'm not against removing it, it does simplify the endpoint... I don't recall if we had a reason or decided it'd be useful...

If we removed index and balance, then ppl that want that info could still retrieve via /eth/v1/beacon/states/{state_id}/validators/{validator_id}, which is admittedly a single validator at a time... is that sufficient? I can see that endpoint potentially living on, and providing that extra, more targeted information.

I'm a fan of the idea of paring this down to MVP and building out other interfaces if theres a reason...

mcdee commented 3 months ago

I'm very back-and-forth on including the index in the output. It complicates the work required by client teams (requiring a non-spec structure), but is definitely useful information.

An endpoint that would provide mapping between indices and public keys would have to be something like /eth/v1/beacon/states/{state_id}/validator_identities, which took the same input as the validator_balances endpoint and returned a list of index,public_key,activation_epoch (the latter item required if we ever implement validator index reuse).

Because the data is useful, and because most client teams should already have something similar for their implementation of the v1 endpoint, I'm inclined towards it being better to keep index in the output of this endpoint. But I'm not sold either way. Does anyone else have strong feelings on this?

rolfyone commented 3 months ago

I'd be for the new endpoint, and the validators endpoint just having the validator object...

mcdee commented 3 months ago

I have put together #452 that provides the validator identities endpoint for review.

I'll have a further consider about how this plays out for validator clients. Ultimately I suspect that most validator clients would wrap their current call to /validators with calls to both /validators and /validator_identities and then combine them, as downstream structs are likely to carry both validator and index info, but want to think about if there would be any issues with having this data obtained from two separate endpoints.

rkapka commented 3 months ago

I am OK with losing the index and balance, but I'm not sold on status.

There is definitely value in being able to filter by status even though it can be calculated from the result. Because if I want to query for all active_exiting validators, then returning over 1MM validators when only a few are exiting is poor.

As for the result, if we drop it then we need the epoch, which doesn't make things simpler than it is now. On the contrary, it makes things more annoying - not only you still have to read a value outside the validator object (and although true that it's necessary only for non-numeric states, client code becomes even more complicated if you return the epoch only sometimes), but you also need extra calculation.

rkapka commented 3 months ago

Actually, why can't we just add SSZ support to v1? The only breaking change is the consensus version, but we don't need it since validator objects are the same for every version. You could still argue that dropping index and balance (and possibly status) is worth it, but even without these fields the response will be huge, so it's not a win big enough to justify a version update.

mcdee commented 3 months ago

There is definitely value in being able to filter by status even though it can be calculated from the result. Because if I want to query for all active_exiting validators, then returning over 1MM validators when only a few are exiting is poor.

This endpoint still allows filtering by status by passing in the required statuses in the request body.

As for the result, if we drop it then we need the epoch...

There is an open question as to if we return the epoch in the return information (potentially as a header), although for many cases it will not be required as long as the entity carrying out the query knows the genesis of the chain and the current time.

Actually, why can't we just add SSZ support to v1? The only breaking change is the consensus version, but we don't need it since validator objects are the same for every version.

They have been the same so far, but that's not guaranteed (they were nearly changed for Deneb, and again in Electra). And returning SSZ without versioning is likely to lead into trouble at some point.

You could still argue that dropping index and balance (and possibly status) is worth it, but even without these fields the response will be huge, so it's not a win big enough to justify a version update.

If all the fields are dropped then it fundamentally changes the return type to move it to an array of pre-existing Validator structs; this is potentially where the PR is going. But even if not, removal of any of the fields would require a version bump.

The size win is allowing SSZ encoding, which is significantly smaller compared to the JSON encoding. We could go for an encoding of the status, as mentioned in the OP, which would then potentially open the door for keeping this at V1, but that's something that I think makes more sense to look at after we have decided on the best output for this endpoint.

rkapka commented 2 months ago

These are all good points. I am in favor of this.

rolfyone commented 2 months ago

add to changes table please.

mcdee commented 2 months ago

Going back to the questions in the original comment with this PR:

should we add an epoch metadata field in the returned data to allow recreation of validator states for when the state identifier used as part of this call is non-numeric (head, finalized etc.)?

It seems that there is no objection to this, and it has benefit so makes sense to add.

should we remove the balance value for each validator? This value is available from the validator_balances endpoint, does it make sense to still duplicate it here?

There has been general approval of removing this field.

could we realistically remove the index value for each validator? The big benefit of doing this would be that the data returned could be a simple array of Validator objects, which would remove the need for custom structures when decoding SSZ (and, indeed, JSON). However, there is a significant downside here in that the index information would no longer be easy to obtain. Countering that, should we be promoting the use of indices anywhere in the API when it's possible that index reuse will become a thing?

The introduction of the validator_identities endpoint in #452 solves this issue, and allows this endpoint to return spec structs for both JSON and, importantly, SSZ.

if we do not return status, should we still allow filtering the requested values by status?

This has use, and the logic for calculating status on the server is minimal (and already exists for the v1 endpoint) so it seems that we should keep this ability.

I'll look to make the above changes tomorrow if there are no objections raised.

nflaig commented 2 months ago

should we add an epoch metadata field in the returned data to allow recreation of validator states for when the state identifier used as part of this call is non-numeric (head, finalized etc.)?

This makes sense to me, we are polling the api by passing head, while the VC knows the current epoch itself, it feels more correct to use the epoch of the data. This also makes it easier for consumers that query historical data when you query by slot or state root as you don't need to calculate the epoch or might not even know it in the latter case.

should we remove the balance value for each validator? This value is available from the validator_balances endpoint, does it make sense to still duplicate it here?

It saves some work on BN side and reduces size of response, seems reasonable to remove this if nobody is using it

if we do not return status, should we still allow filtering the requested values by status?

Definitely should keep filtering by status

could we realistically remove the index value for each validator?

I am not sold on removing the index and having to call 2 different apis, at least in Lodestar, we wanna know the status of the validator, so just using the proposed validator_identities is not sufficient.

Splitting this into 2 apis, one which just returns the validator object and the other returning the indices is only useful imo if there are consumers which use them independently and even then we can still add the validator_identities if we want a lightweight api for just the indices.

We can probably get away with calling both apis as well since we have SSZ encoding now but if everyone then has to implement custom logic on the VC side to remap indices to validator objects, it does not seem worth it.

Are there any other benefits of dropping the index here besides having to define a new SSZ container for it?

rolfyone commented 1 month ago

To me simplifying to return the standard validator object was a big advantage of this over the v1 where we've gone and collated a bunch of data. This IMO is worth while, because the less custom objects we end up with the better. It's just extra work all round to implement the custom object representations (admittedly that work is already done now) Data wise, the v1 validators endpoint was pretty singularly horrible. I'd be a fan of moving to these two api's just so that 'bad' example isn't in our data models.

rkapka commented 3 weeks ago

I personally dislike losing the status. One example where dropping status makes things much more annoying is if I query by more than one status. I will get back a list of validators and I will have to calculate the status of each validator myself. Although I understand that we all like clean code, I am against worsening UX just to make things "pretty".

mcdee commented 3 weeks ago

@rkapka dropping status is to allow us to move to SSZ encoding, which gives us significant memory usage and speed improvements. So the benefit is a lot more than "pretty".

ethereum / beacon-APIs

Add validators v2. #449