Tracker logic - Githubissues

mempirate commented 4 months ago

Context

The consumer now processes all events and turns them into validator_metadata_events when it thinks there are validators attached to a beacon node. Sentries will try to redo succesful handshakes & metadata exchanges every epoch, which means that in the ideal case we have data points on each beacon node every epoch.

The next step is to build a view on live validators based on these events. The table will look like this (stored in sqlite DB inside consumer)

PeerID
ENR
Multiaddr
IP
Port
LastSeen
LastEpoch
PossibleValidator: bool
AverageValidatorCount: int
NumObservations: counter

Bold properties are ones that we have to derive. The rest can just be extracted from the metadata events. Note that we maintain a single row per PeerID and PeerID can thus be the primary key.

Methodology

For each metadata_event that results in a validator_metadata_event, we do the following:

If shortLivedSubnets > 0 we set PossibleValidator = true
Increment NumObservations
Derive estimated number of validators: AverageValidatorCount will be a cumulative average of all the validator count estimations so far. The ValidatorCount entry for the current event will be shortLivedSubnets / 2, because validators at any point will be subscribed to 2 attnets: one for the current epoch and once for the next. NOTE: we should round up not down when doing integer division! I.e. q := 1 + (x - 1) / y
Set the new AverageValidatorCount to be the cumulative average of the previous ones: $$new average=current average+\frac{new count}{(new value−current average)}$$
Insert or modify row

namn-grg commented 4 months ago

Thanks for the detailed description!

Should we also change ParquetValidatorEvent to the same fields as the above table?

mempirate commented 4 months ago

@namn-grg no, this new table will not be a table of events (which the validator event is), but a live view of active validators based on these events. So no need to change the existing events.

namn-grg commented 3 months ago

Now that we will estimate the validator count through attestation aggregators, I think AverageValidatorCount will not make sense as the validator will be subscribed to the short-lived attnets for certain epochs only. Instead of this, I propose to have a MaxValidatorCount, which will keep the max count of validators observed.

We also want to have a client version to get more accurate data analysis.

Updated schema -

PeerID
ENR
Multiaddr
IP
Port
LastSeen
LastEpoch
ClientVersion
PossibleValidator: bool
MaxValidatorCount: int
NumObservations: counter

@mempirate thoughts?

mempirate commented 3 months ago

@namn-grg makes sense, good call!

chainbound / valtrack

Tracker logic #27

Context

Methodology