Closed dapplion closed 2 years ago
there are almost 600 different aggregate and proof items per slot, each validateAggregateAttestation
call takes 6ms in average (some exceptional ones may take up to 60ms due to gc), most of the time this function does is to do batch signature verification. For this, we should consider increasing our job queue size.
there are some duplicate attestations (with different aggregator), it's a redundancy to validate latter ones since in the end, attestation and proof db has attestation root as key
we should create TreeBacked value for gossip AggregateAndProof since we need to do struct_hashTreeRoot
in a couple of places (getAggregateAndProofSignatureSet
, getIndexedAttestationSignatureSet
and AggregateAndProofRepository.add
)
no other heavy operations except for signature verification, it's just too many aggregate and proof to validate
Already addressed with https://github.com/ChainSafe/lodestar/pull/2760 and https://github.com/ChainSafe/lodestar/pull/2801
Metrics from Prater in one of our Contabo VPS S size nodes, show that when the node is synced 80% of CPU time is spent validating aggregate and proof gossip messages. To keep up we also drop 35% of all received messages.
The average job duration is 20-30ms. In Prater stable conditions all attestation target states should be in the cache and cost 0 to get. Then the only big cost is signature validation, which has 3 (selection proof + aggregator sig + att sig aggregate). A BLS sig costs between 1-2 ms, and since the 3 sigs are verified in batch it should have a discount of ~50%. So the total job time should be between 1.5-3ms.
We should investigate the performance of that validation since there is significant room for improvement.