Open mcdee opened 3 months ago
/validators
just because you need indices is...If we removed index and balance, then ppl that want that info could still retrieve via /eth/v1/beacon/states/{state_id}/validators/{validator_id}
, which is admittedly a single validator at a time... is that sufficient? I can see that endpoint potentially living on, and providing that extra, more targeted information.
I'm a fan of the idea of paring this down to MVP and building out other interfaces if theres a reason...
I'm very back-and-forth on including the index in the output. It complicates the work required by client teams (requiring a non-spec structure), but is definitely useful information.
An endpoint that would provide mapping between indices and public keys would have to be something like /eth/v1/beacon/states/{state_id}/validator_identities
, which took the same input as the validator_balances
endpoint and returned a list of index,public_key,activation_epoch
(the latter item required if we ever implement validator index reuse).
Because the data is useful, and because most client teams should already have something similar for their implementation of the v1
endpoint, I'm inclined towards it being better to keep index
in the output of this endpoint. But I'm not sold either way. Does anyone else have strong feelings on this?
I'd be for the new endpoint, and the validators endpoint just having the validator object...
I have put together #452 that provides the validator identities endpoint for review.
I'll have a further consider about how this plays out for validator clients. Ultimately I suspect that most validator clients would wrap their current call to /validators
with calls to both /validators
and /validator_identities
and then combine them, as downstream structs are likely to carry both validator and index info, but want to think about if there would be any issues with having this data obtained from two separate endpoints.
I am OK with losing the index and balance, but I'm not sold on status.
There is definitely value in being able to filter by status even though it can be calculated from the result. Because if I want to query for all active_exiting
validators, then returning over 1MM validators when only a few are exiting is poor.
As for the result, if we drop it then we need the epoch, which doesn't make things simpler than it is now. On the contrary, it makes things more annoying - not only you still have to read a value outside the validator object (and although true that it's necessary only for non-numeric states, client code becomes even more complicated if you return the epoch only sometimes), but you also need extra calculation.
Actually, why can't we just add SSZ support to v1
? The only breaking change is the consensus version, but we don't need it since validator objects are the same for every version. You could still argue that dropping index and balance (and possibly status) is worth it, but even without these fields the response will be huge, so it's not a win big enough to justify a version update.
There is definitely value in being able to filter by status even though it can be calculated from the result. Because if I want to query for all active_exiting validators, then returning over 1MM validators when only a few are exiting is poor.
This endpoint still allows filtering by status by passing in the required statuses in the request body.
As for the result, if we drop it then we need the epoch...
There is an open question as to if we return the epoch in the return information (potentially as a header), although for many cases it will not be required as long as the entity carrying out the query knows the genesis of the chain and the current time.
Actually, why can't we just add SSZ support to v1? The only breaking change is the consensus version, but we don't need it since validator objects are the same for every version.
They have been the same so far, but that's not guaranteed (they were nearly changed for Deneb, and again in Electra). And returning SSZ without versioning is likely to lead into trouble at some point.
You could still argue that dropping index and balance (and possibly status) is worth it, but even without these fields the response will be huge, so it's not a win big enough to justify a version update.
If all the fields are dropped then it fundamentally changes the return type to move it to an array of pre-existing Validator
structs; this is potentially where the PR is going. But even if not, removal of any of the fields would require a version bump.
The size win is allowing SSZ encoding, which is significantly smaller compared to the JSON encoding. We could go for an encoding of the status, as mentioned in the OP, which would then potentially open the door for keeping this at V1, but that's something that I think makes more sense to look at after we have decided on the best output for this endpoint.
These are all good points. I am in favor of this.
add to changes table please.
Going back to the questions in the original comment with this PR:
It seems that there is no objection to this, and it has benefit so makes sense to add.
There has been general approval of removing this field.
The introduction of the validator_identities
endpoint in #452 solves this issue, and allows this endpoint to return spec structs for both JSON and, importantly, SSZ.
This has use, and the logic for calculating status on the server is minimal (and already exists for the v1 endpoint) so it seems that we should keep this ability.
I'll look to make the above changes tomorrow if there are no objections raised.
should we add an epoch metadata field in the returned data to allow recreation of validator states for when the state identifier used as part of this call is non-numeric (head, finalized etc.)?
This makes sense to me, we are polling the api by passing head
, while the VC knows the current epoch itself, it feels more correct to use the epoch of the data. This also makes it easier for consumers that query historical data when you query by slot or state root as you don't need to calculate the epoch or might not even know it in the latter case.
should we remove the balance value for each validator? This value is available from the validator_balances endpoint, does it make sense to still duplicate it here?
It saves some work on BN side and reduces size of response, seems reasonable to remove this if nobody is using it
if we do not return status, should we still allow filtering the requested values by status?
Definitely should keep filtering by status
could we realistically remove the index value for each validator?
I am not sold on removing the index and having to call 2 different apis, at least in Lodestar, we wanna know the status of the validator, so just using the proposed validator_identities
is not sufficient.
Splitting this into 2 apis, one which just returns the validator object and the other returning the indices is only useful imo if there are consumers which use them independently and even then we can still add the validator_identities
if we want a lightweight api for just the indices.
We can probably get away with calling both apis as well since we have SSZ encoding now but if everyone then has to implement custom logic on the VC side to remap indices to validator objects, it does not seem worth it.
Are there any other benefits of dropping the index here besides having to define a new SSZ container for it?
To me simplifying to return the standard validator object was a big advantage of this over the v1 where we've gone and collated a bunch of data. This IMO is worth while, because the less custom objects we end up with the better. It's just extra work all round to implement the custom object representations (admittedly that work is already done now) Data wise, the v1 validators endpoint was pretty singularly horrible. I'd be a fan of moving to these two api's just so that 'bad' example isn't in our data models.
I personally dislike losing the status. One example where dropping status makes things much more annoying is if I query by more than one status. I will get back a list of validators and I will have to calculate the status of each validator myself. Although I understand that we all like clean code, I am against worsening UX just to make things "pretty".
@rkapka dropping status is to allow us to move to SSZ encoding, which gives us significant memory usage and speed improvements. So the benefit is a lot more than "pretty".
This PR adds a version 2 of the validators API.
The requirement comes from the number of entries in the
Validators
array in state on mainnet. At time of writing there are over 1.4MM entries, and no plans to reduce this number (there are some options that could reduce the number of active validators, but this doesn't help the array itself). The JSON returned by v1 of this response is over 600MB in size, which has knock-on effects for both speed of delivery (it takes approximately 5s to send this much data over a 1GB link) and decoding time. As such, a version of this endpoint that supports SSZ would be useful.The changes in this endpoint from v1 are:
status
item from the per-validator datastatus
is a text string in the JSON response for v1 of this call, so does not have a canonical mapping to SSZ. Instead of removing it, other options are to define a mapping similar to an enum, or to provide it as a byte array. Both come with added complexity or increased data, and the information that it provides can be calculated from the data that is already provided in the validator struct (plus an epoch).Some open questions around this endpoint:
epoch
metadata field in the returned data to allow recreation of validator states for when the state identifier used as part of this call is non-numeric (head
,finalized
etc.)?balance
value for each validator? This value is available from thevalidator_balances
endpoint, does it make sense to still duplicate it here?index
value for each validator? The big benefit of doing this would be that the data returned could be a simple array ofValidator
objects, which would remove the need for custom structures when decoding SSZ (and, indeed, JSON). However, there is a significant downside here in that the index information would no longer be easy to obtain. Countering that, should we be promoting the use of indices anywhere in the API when it's possible that index reuse will become a thing?