chainbound / valtrack

An Ethereum validator crawler
MIT License
13 stars 3 forks source link

Tracker logic #27

Closed mempirate closed 3 months ago

mempirate commented 4 months ago

Context

The consumer now processes all events and turns them into validator_metadata_events when it thinks there are validators attached to a beacon node. Sentries will try to redo succesful handshakes & metadata exchanges every epoch, which means that in the ideal case we have data points on each beacon node every epoch.

The next step is to build a view on live validators based on these events. The table will look like this (stored in sqlite DB inside consumer)

Bold properties are ones that we have to derive. The rest can just be extracted from the metadata events. Note that we maintain a single row per PeerID and PeerID can thus be the primary key.

Methodology

For each metadata_event that results in a validator_metadata_event, we do the following:

  1. If shortLivedSubnets > 0 we set PossibleValidator = true
  2. Increment NumObservations
  3. Derive estimated number of validators: AverageValidatorCount will be a cumulative average of all the validator count estimations so far. The ValidatorCount entry for the current event will be shortLivedSubnets / 2, because validators at any point will be subscribed to 2 attnets: one for the current epoch and once for the next. NOTE: we should round up not down when doing integer division! I.e. q := 1 + (x - 1) / y
  4. Set the new AverageValidatorCount to be the cumulative average of the previous ones: $$new average=current average+\frac{new count}{(new value−current average)}$$
  5. Insert or modify row
namn-grg commented 4 months ago

Thanks for the detailed description!

Should we also change ParquetValidatorEvent to the same fields as the above table?

mempirate commented 4 months ago

@namn-grg no, this new table will not be a table of events (which the validator event is), but a live view of active validators based on these events. So no need to change the existing events.

namn-grg commented 3 months ago

Now that we will estimate the validator count through attestation aggregators, I think AverageValidatorCount will not make sense as the validator will be subscribed to the short-lived attnets for certain epochs only. Instead of this, I propose to have a MaxValidatorCount, which will keep the max count of validators observed.

We also want to have a client version to get more accurate data analysis.

Updated schema -

@mempirate thoughts?

mempirate commented 3 months ago

@namn-grg makes sense, good call!