ethpandaops / checkpointz

An Ethereum beacon chain checkpoint sync provider
GNU General Public License v3.0
141 stars 31 forks source link

Serve light client data #143

Open etan-status opened 1 year ago

etan-status commented 1 year ago

To enable users to start from an older block hash, light client data should be made available via Checkpointz.

The light client sync protocol allows to take a very old block root, and transform it into a recent one, without requiring the user to manually copy/paste checkpoint block roots around.

This is more secure than genesis sync, and also more secure than downloading a random finalized state from a third party without validation.

Required APIs that Checkpointz would have to cache and serve:

  1. /eth/v1/beacon/light_client/bootstrap/{block_root} - about 25 KB each
    • Has to be cached for each finalized checkpoint, including ones in the past. Should persist these for a really long time (months), these don't expire.
  2. /eth/v1/beacon/light_client/updates - about 25 KB each
    • This is a range query with start+count. Historical entries (older than 256 epochs) don't change. The latest entry may rarely change. It is good enough to only cache and make available historical entries (without the most recent one). Should persist these for a really long time (months), these don't expire.
    • Different clients may request different ranges of the data, and the cache should be prepared for that; e.g., cache each period separately so that requests from 20-40 and 15-35 can be served from the same database without requiring separate caches for each possible range.
  3. /eth/v1/beacon/light_client/finality_update - about 2 KB
    • This changes each time new finalized checkpoint block is reached. Can expire the cache every epoch (or every time finality changes), and only need to keep the most recent one around for this.

Bonus APIs for full light client data availability:

  1. /eth/v1/beacon/light_client/optimistic_update - about 1 KB
    • Changes every slot; not essential for checkpoint syncing but is a nice to have for light client syncing (e.g., web browser wallets).
    • Could be faked by using the finalized_update response and omitting finalized_header and finality_branch in the response. This could work around client limitations when only a partial light client API is available on the server
  2. /eth/v1/events
    • light_client_finality_update and light_client_optimistic_update
    • Push mechanism for finality_update and optimistic_update. Don't think checkpointz is used for that, but if eventstream is supported, would be great to also have access to these two topics. They provide the same data as /eth/v1/beacon/light_client/finality_update and /eth/v1/beacon/light_client/optimistic_update.
etan-status commented 8 months ago

image

Example here based on Nimbus. The server here serves the light client data that this issue proposes should be added to checkpointz. That enables a secure sync experience without having to manually pass any block root or state root. The extra endpoints allow the server to proof that what it is sending as the checkpoint state is not malicious. That allows reducing the trust assumption on the server to simple data availability.

samcm commented 7 months ago

The main blocker for implementing this is that checkpointz doesn't store anything on disk, and is currently fairly light to run. Keen to investigate this.

etan-status commented 7 months ago

Thanks for checking on this!

Hmm. Maybe it's possible to get away without disk? As in, just forwarding the /light_client endpoints to the backing beacon node transparently (without the /eth/v1/events endpoint). If necessary some rate limiting could be added in front of it.

Keep in mind that so far only Lodestar and Nimbus support the /light_client endpoint.

samcm commented 7 months ago

Yup acting as a proxy is certainly an option! Might make sense to check with current providers if that risk is acceptable, the security/load concerns are entirely restricted to Checkpointz atm 🙏

philknows commented 7 months ago

Only able to speak for the ChainSafe checkpoints here, but we'd be happy to try this out and keep an eye on our resource utilization change by enabling this as a proxy to our public nodes. If this feature is kept optional, maybe providers can just choose to enable if the endpoints are available depending on what clients they're running.

etan-status commented 7 months ago

Another aspect to keep in mind is that the BN node behind the checkpointz server should ideally be genesis synced. Light client data cannot be reliably backfilled until the corresponding protocols are in place:

@philknows would be great if your public nodes have old data available! Also, there's a move to collect canonical data that is deterministic across the backing BN, which simplifies syncing in the future (includes test cases). This way, swapping the backing BN should not result in different data being served:

etan-status commented 7 months ago

Might make sense to check with current providers if that risk is acceptable

With both Nimbus and Lodestar deeming it fine regarding the load of simply exposing the routes as a transparent proxy, I think it's worth giving this a shot.

Namely, the following routes should be proxied:

The following route would not be proxied at this time as it is more expensive and not strictly necessary for syncing:

etan-status commented 5 months ago

@samcm Would it be possible to get the four light client routes (without events) exposed? Extra server load for this is minimal.

samcm commented 3 weeks ago

@etan-status I've made some progress on this, should have something to test next week :)

samcm commented 2 weeks ago

@etan-status How import is SSZ support for these endpoints? Is JSON ok for now?

etan-status commented 2 weeks ago

To obtain SSZ, use HTTP Accept header, Accept: application/octet-stream selects SSZ format.

If the light client routes are proxied transparently to the backing server, the server will take care of both SSZ and JSON. It is very cheap for servers to provide answers to light client endpoints, they are direct lookups from a database without expensive operations.

samcm commented 2 weeks ago

Yeah we aren't blind proxying the response body - it's being parsed and then marshaled back to json (or SSZ), so we'd have to implement ssz for those types.

etan-status commented 2 weeks ago

Generally, SSZ is about twice as efficient, as binary data won't be sent as hex strings. If possible, would be great if SSZ would be supported as well.

There's nothing special about the SSZ representation of these types that isn't already used by the /debug/states endpoint.

samcm commented 2 weeks ago

ethpandaops/checkpointz:0.0.6-light-client is ready for testing, just requires a config change to enable 🙏

e.g.

checkpointz:
  light_client:
    enabled: true
    mode: proxy

Definitely not mainnet ready yet. Also only supports JSON atm, unsure if I'll implement ssz at the moment.