Closed g11tech closed 8 months ago
The current interface of the pubkey cache is
interface PubkeyCache {
/** When processing signed objects must resolve validator indexes into a representation of the pubkey
optimized for fast aggregation */
getPubkeyForAggregation(index: ValidatorIndex): bls.PublicKey;
/** `process_deposit` requires to efficiently lookup if a pubkey is known in the state or not */
getPubkeyIndex(pubkey: bytes): ValidatorIndex | null;
/** Register new pubkey to cache. May be done when processing deposit or latter */
addPubkey(pubkey: bytes, index: ValidatorIndex): void;
}
That API is currently implemented with two data structures which are globally shared on all states
type Index2PubkeyCache = PublicKey[];
type PubkeyIndexMap = Map<PubkeyHex, ValidatorIndex>();
As @g11tech mentions we must have a cache that's fork aware. We should use the persistent-ts
structurally shared data structures to make cloning cheap.
So attach to every state this two datastructures that are cloned on each state clone and can be safely mutated on block processing
unfinalizedPubkeyCache = immutable.List<PublicKey | null>
unfinalizedPubkeyIndexMap = immutable.Map<PubkeyHex, ValidatorIndex>
The mutliplex on both caches
function getPubkeyForAggregation(index: ValidatorIndex): bls.PublicKey {
if (index < latestFinalizedValidatorsLen) {
// Lookup global cache
// - Should cover +99% of cases. Random access on fast Array
return globalPubkeyCache[index];
} else {
// Lookup state's local cache
// - Unlikely unless long non-finality. Slower access on immutable.List
if (unfinalizedPubkeyCache[index] == null) {
// Prime cache reading into the state.validators[index].pubkey
// To prevent deserializing pubkey twice, keep global `unfinalizedPubkeyMap = Map<PubkeyHex, PublicKey>`
// Deserialization may be done proactively if unfinalized validators become active
primeUnfinalizedPubkeyCache(index);
}
return unfinalizedPubkeyCache[index];
}
}
function getPubkeyIndex(pubkey: bytes): ValidatorIndex | null {
// Lookup global cache
return globalPubkeyIndexMap.get(pubkey)
// Then lookup local cache
// - Note now for new deposits both caches will be lookup
|| unfinalizedPubkeyIndexMap.get(pubkey);
}
Regarding cache insertion, we must ensure that each pubkey is only deserialized once, since those representations are memory heavy. getPubkeyForAggregation
should be lazy for the unfinalized portion since it's unlikely to be called in normal cases. The local cache of the finalized state has to be rotated into the global cache
function addPubkey(pubkey: bytes, index: ValidatorIndex): void {
unfinalizedPubkeyCache.set(index, null); // lazy cache, prime latter
unfinalizedPubkeyIndexMap.set(pubkey, index);
}
I think we can make unfinalizedPubkeyCache = immutable.List<PubkeyHex | null>
, and have a global hexToPublicKey: Map <PubkeyHex, PublicKey>
in epoch context itself
Background
The current interface of the pubkey cache is
interface PubkeyCache { /** When processing signed objects must resolve validator indexes into a representation of the pubkey optimized for fast aggregation */ getPubkeyForAggregation(index: ValidatorIndex): bls.PublicKey; /** `process_deposit` requires to efficiently lookup if a pubkey is known in the state or not */ getPubkeyIndex(pubkey: bytes): ValidatorIndex | null; /** Register new pubkey to cache. May be done when processing deposit or latter */ addPubkey(pubkey: bytes, index: ValidatorIndex): void; }
That API is currently implemented with two data structures which are globally shared on all states
type Index2PubkeyCache = PublicKey[]; type PubkeyIndexMap = Map<PubkeyHex, ValidatorIndex>();
Split cache
As @g11tech mentions we must have a cache that's fork aware. We should use the
persistent-ts
structurally shared data structures to make cloning cheap.So attach to every state this two datastructures that are cloned on each state clone and can be safely mutated on block processing
unfinalizedPubkeyCache = immutable.List<PublicKey | null> unfinalizedPubkeyIndexMap = immutable.Map<PubkeyHex, ValidatorIndex>
The mutliplex on both caches
function getPubkeyForAggregation(index: ValidatorIndex): bls.PublicKey { if (index < latestFinalizedValidatorsLen) { // Lookup global cache // - Should cover +99% of cases. Random access on fast Array return globalPubkeyCache[index]; } else { // Lookup state's local cache // - Unlikely unless long non-finality. Slower access on immutable.List if (unfinalizedPubkeyCache[index] == null) { // Prime cache reading into the state.validators[index].pubkey // To prevent deserializing pubkey twice, keep global `unfinalizedPubkeyMap = Map<PubkeyHex, PublicKey>` // Deserialization may be done proactively if unfinalized validators become active primeUnfinalizedPubkeyCache(index); } return unfinalizedPubkeyCache[index]; } }
function getPubkeyIndex(pubkey: bytes): ValidatorIndex | null { // Lookup global cache return globalPubkeyIndexMap.get(pubkey) // Then lookup local cache // - Note now for new deposits both caches will be lookup || unfinalizedPubkeyIndexMap.get(pubkey); }
Regarding cache insertion, we must ensure that each pubkey is only deserialized once, since those representations are memory heavy.
getPubkeyForAggregation
should be lazy for the unfinalized portion since it's unlikely to be called in normal cases. The local cache of the finalized state has to be rotated into the global cachefunction addPubkey(pubkey: bytes, index: ValidatorIndex): void { unfinalizedPubkeyCache.set(index, null); // lazy cache, prime latter unfinalizedPubkeyIndexMap.set(pubkey, index); }
Re-use indexes
- Important to think how re-using indexes impacts a fork-aware cache EIP6914 - Reuse indexes with full sweep ethereum/consensus-specs#3307
I have dug into the current use of index2pubkey
in the codebase. It is used in block, attestation, slashing and sync committee related case for which all of them require active validators.
There is also a case where process_voluntary_exit()
uses index2pubkey
, but it was guarded by is_active_validator()
check right before we validate the signature.
I believe there is no need to have the multiplex in getPubkeyForAggregation()
and we don’t need to have unfinalizedPubkeyCache
attached to the states
aha very nice analysis @naviechan
I believe there is no need to have the multiplex in getPubkeyForAggregation() and we don’t need to have unfinalizedPubkeyCache attached to the state
what about
getPubkeyForAggregation
also talks about this. i suggest to keep the original approach proposed, where we prime them in a lazy fashion.
aha very nice analysis @naviechan
I believe there is no need to have the multiplex in getPubkeyForAggregation() and we don’t need to have unfinalizedPubkeyCache attached to the state
what about
1. very long cases of non finality that we do see on the devnets if not testnets (and hopefully never on mainnet). i think the suggested `getPubkeyForAggregation` also talks about this. 2. serving beacon apis querying for status of unfinalized validators
i suggest to keep the original approach proposed, where we prime them in a lazy fashion.
Having a fork aware auto-pruned index2pubkey for non-finalized pubkeys is a complexity we should avoid if not necessary. To verify signatures the fallback can be:
awfully slow, but covers the case. We should have WARN logs if that codepath happens
For pubkey2index we 100% need the un-finalized cache
makes sense, so the we can encapsulate this away in getPubkeyForAggregation
and obviously pass state to it to make it fork aware
(or put it in cache as well depending upon the optional flags to cache, but all that can be done later on based on usage requirements)
I have dug into the current use of
index2pubkey
in the codebase. It is used in block, attestation, slashing and sync committee related case for which all of them require active validators.There is also a case where
process_voluntary_exit()
usesindex2pubkey
, but it was guarded byis_active_validator()
check right before we validate the signature.I believe there is no need to have the multiplex in
getPubkeyForAggregation()
and we don’t need to haveunfinalizedPubkeyCache
attached to the states
I realized this is a bold claim without any supporting evidence.
I have written up a more thorough analysis as a proof of the needlessness of unfinalized index2pubkey
:
https://hackmd.io/@adrninistrator1/SkHmz972n
Many thanks to @dapplion for suggestions and comments
aha very nice analysis @naviechan
I believe there is no need to have the multiplex in getPubkeyForAggregation() and we don’t need to have unfinalizedPubkeyCache attached to the state
what about
- very long cases of non finality that we do see on the devnets if not testnets (and hopefully never on mainnet). i think the suggested
getPubkeyForAggregation
also talks about this.- serving beacon apis querying for status of unfinalized validators
i suggest to keep the original approach proposed, where we prime them in a lazy fashion.
In the case of long non-finality:
finalized_checkpoint.epoch
only affects new validators entering activation queue. Validators that are already in the queue will dequeue just fineis_active_validator()
is not affected by long non-finality, so any call to index2Pubkey
will stay the same (attestation, sync committee etc.), and hence the conclusion of the needlessness of unfinalizedIndex2Pubkey
remains unchangedunfinalizedPubkey2Index
getting populated is through process_deposit()
which the function already has the public key object, it makes sense we populate it directly because the validator index is just validators.length - 1
. If we decide to do this lazily, we will need to search for the validator index later which will be quite slow. Something like [i for i, v in enumerate(state.validators) if v.pubkey == pubkey]
unfinalizedIndex2Pubkey
as Lion initally proposed. Because 1) we can delay storing the pubkey 2) we can just check state.validators[i].pubkey
with index i
. But the idea of unfinalizedIndex2Pubkey
is scraped.Map<PubkeyHex, PublicKey>
. Can someone confirm that?I also believe there is no need to have a map Map<PubkeyHex, PublicKey>. Can someone confirm that?
yes no need i think :+1: , also no need for unfinalizedPubkey2Index
which is state dependancy and can be computed on the fly for serving the apis via utility fn which takes state and pubkey as inputs
Closing since #6042 is merged. Some minor follow-ups will be tracked along with other Electra EIPs in #6341
The EIP aims to bring deposit mechanism in the beacon chain. Currently the deposits and fetched and processed wiith a sufficient lag (12 hrs) that sort of effectively rule out any change of re-org in terms of processing deposits. With now the proposed deposit receipt being the first order citizen of the beacon network and a beacon node bundle and pack them in a block to add them into canonical chain (like other messages like exit, bls changes etc)
Following would be the tracker items:
However apart from these routine tasks, lodestar would need to refactor and handle its pubkey <> index map which is currently part of EpochContext which is a global object. Currently since deposit logs from EL are processed with such a big log, its very deterministic the order in which deposits would be processed and packed in the blocks and hence gets assigned indexes in EpochContext without any worries.
But with moving the deposit receipt processing in the beacon, this determinism goes away as now the indexes will now depend on the block receipt packing, more precisely varying between the forks. This leads to the precursor refactoring task.
tip
node in the forkchoice protoarray (i.e no children) have an additionalforkMap
which is basically obtained by applying the receipts in order of theunfinalized
blocks added in the fork.tip
map is updated and assigned to the new tipPS: still wip and pending discussions
cc @dapplion @wemeetagain