Closed hwwhww closed 6 years ago
Working on this in https://github.com/prysmaticlabs/prysm/pull/440 at Prysm
So just to recap, we're not storing active state details because they can be computed from blocks anyways. Since the crystallized state transition only occurs on each new cycle and involves more computation than determining an active state, I don't think it's a big problem to persist most of it. At the very least, should we be persisting the validator set with a lookup key?
I'm leaning towards limiting dependencies in this poc repo and not going in the direction of bringing in a db if possible. Instead just having clear data structures and pure functions.
That said, if the forkchoice becomes to messy to implement, I could be convinced, but would maybe rather just start working on a fork of pyevm
We could copy the db interface from PyEVM and just use their MemoryDB
. Then we would not have any additional dependencies and integrating it at some point with PyEVM or switching to a proper db would be more or less trivial.
That said, it probably wouldn't make much of a difference compared to just using a dict
directly.
@rauljordan
Ah, I saw your point, AcitveState
is not hard to be rebuilt by CrystallizedState
and Block
s. (I thought PryLabs stores ActiveState
when I read this line two days ago: https://github.com/prysmaticlabs/prysm/blob/366e5168ba03832a6b423d300524b76268af8242/beacon-chain/blockchain/core.go#L168, but you only stored the head ActiveState
, and will change to only in-memory?)
EthereumJ stores all data structures in key/value DB. It has an interface that abstracts storage from specific KV engines. For instance, RocksDB
is used to store real data on disk. But in tests EthJ uses in-memory DB which implementation is pretty similar to MemoryDB
.
I agree and support @jannikluhn https://github.com/ethereum/beacon_chain/issues/81#issuecomment-417226082: some abstract interface + in-memory DB is a good thing to start from.
Schema. This is a database schema definition that EthereumJ has so far: https://github.com/ethereum/ethereumj/blob/research/sharding/ethereumj-core/src/main/java/org/ethereum/sharding/config/BeaconConfig.java#L149-L204.
Keys. This is either a hash of encoded object or some index, it depends on the way the data are accessed. Sometimes, e.g. for blocks, there should be an opportunity to access data by both hash and index, depending on the case. It's achieving by introducing another one kv-storage that maps object index on object hash: [index: hash]
+ [hash: object]
.
Encoding. EthereumJ uses RLP. But it can be easily replaced with any other encoder when we will be agree on some.
Tries. Each trie is backed by own kv-storage that is a source of trie nodes: [node_hash: node]
.
P.S. the above came from eth1.0 and works well for the beacon chain and shards.
prysmatic labs has implemented saving attestation and attestationHashes to local DB in this PR. Feel free to check it out
Hey, I have a few questions about this suggestion!
The attesters create AttestationRecords and then broadcast them. The proposer collects the AttestationRecord messages, creates a block, and broadcasts the Block.
Is there an intermediate step here, where some entity (possibly the proposer) collects individual attestations and bundles them into the aggregated attestations, aka AttestationRecord
?
if the AttestationRecord have already been stored in the local DB.
Will this necessarily be true? If proposers are responsible for aggregating attestations and including them in the block, a node that's processing the block won't have those attestations in the DB.
Finally, is this motivation for this proposed schema change to save on storage space, specifically to avoid duplication of AttestationRecord
in the block and the active state?
In general, I'm wondering if the difference between this schema and the naive schema (where attestations are duplicated in the block and active state), is about trade-offs in storage size and lookup time. What I mean is that your schema would prevent duplication of AttestationRecord
in the block and active state, but with the additional cost where scanning the attestations in a block or active state will cost N+1 lookups, where N is the # of attestations.
The attestation hash itself doesn't seem useful. When looking for duplicate attestations, the hash isn't sufficient because 2 attestations with the same slot
and shard
but different oblique_parent_hashes
are still a duplicate, and 2 attestations with the same slot
, shard
, oblique_parent_hashes
but overlapping yet different attester_bitfield
values have to be handled somehow too. Correct me if I'm wrong here!
The naive schema would allow the attestation to be stored within the block/active state, and would only require one lookup. Of course this requires extra storage space in the database since the entirety of the attestation is stored in both the block and the active state.
I'm still a little confused about the distinction between the individual attestations that are signed by attesters, and the aggregated attestations that are included in blocks. From my understanding, it makes sense to store the individual attestations by its hash! But this suggestion seems to be about storing the aggregated attestations by its hash, which I'm having a hard time finding a strong justification for.
Thanks!
Feedback appreciated! I'm so sorry for that I missed @terenc3t and @rawfalafel ’s tag in https://github.com/prysmaticlabs/prysm/pull/472!!!
Is there an intermediate step here, where some entity (possibly the proposer) collects individual attestations and bundles them into the aggregated attestations, aka AttestationRecord?
By AttestationRecord
here do you mean the in wire protocol format? I think it’s possible, maybe should be named it AttestationRecords
in wire protocol, and it could be >=1 attestation(s).
Will this necessarily be true? If proposers are responsible for aggregating attestations and including them in the block, a node that's processing the block won't have those attestations in the DB.
oblique_parent_hashes
Agree with the significant flaw of this scheme as you pointed out! There may be a dirty method by replacing “attestation hash” with sig as the query key and only having oblique_parent_hashes
in Block, but it's still bad since the I/O overhead of re-wrapping block is TOO HUGE as you mentioned, especially while syncing nodes. Treating attestations like transaction pool probably will be a better way to handle the attestations. What do you think about that?
Finally, is this motivation for this proposed schema change to save on storage space, specifically to avoid duplication of AttestationRecord in the block and the active state?
Yes, if as PryLabs’ suggestion, just keeping the ActiveState
in memory, then storing attestations separately is meaningless. The possibility of recalculating ActiveState
depends on the forking rate, but it should be fine if comparing it to the query times of re-wrapping a block.
Thank you for looking into this!
By AttestationRecord here do you mean the in wire protocol format?
Yeah, I'm pointing to a distinction between an "individual attestation", which is what a single attester signs, and contains:
slot
attesterIndex
parent_hashes
shard_id
shard_block_hash
sig
and an "aggregated attestation" aka AttestationRecord
, which is the wire protocol format defined in the spec. I keep on coming back to this distinction because I feel like the two definitions are being incorrectly conflated.
Treating attestations like transaction pool probably will be a better way to handle the attestations. What do you think about that?
Hmm are you talking about a pool for individual attestations or aggregated attestations? I think a pool for individual attestations make sense.
Yes, if as PryLabs’ suggestion, just keeping the ActiveState in memory, then storing attestations separately is meaningless.
Gotcha! thanks for the clarification.
One simple suggestion is to sort block.attestations
and active_state.pending_attestations
by slot
and shard_id
. This way we can do a binary search instead of a scan when searching for existing attestations.
individual attestation
Got it. Yes, another data class + name makes sense.
Hmm are you talking about a pool for individual attestations or aggregated attestations? I think a pool for individual attestations make sense.
Yes, for the individual attestations.
One simple suggestion is to sort block.attestations and active_state.pending_attestations by slot and shard_id. This way we can do a binary search instead of a scan when searching for existing attestations.
The current spec says: Extend the list of AttestationRecord objects in the active_state, ordering the new additions in the same order as they came in the block.
(Should be pending_attestations
) And it's natually the order of making cross-link updated crosslinks: https://github.com/ethereum/beacon_chain/blob/dddee7781386b1c553ea72ad556af4bde09b10c7/beacon_chain/state/state_transition.py#L195
Do you have an idea of what's the use case that we have to search for existing attestations except for a blockchain explorer?
Thanks!
Do you have an idea of what's the use case that we have to search for existing attestations except for a blockchain explorer?
Say the beacon chain processes slot N
which contains attestation X
. Next, the beacon chain attempts to process slot N+1
which contains the exact same attestation X
. Should the beacon chain detect that there's a duplicate attestation?
As another example, what if slot N+1
contains an attestation that is similar to attestation X
except for a different value for shard_block_hash
. Should that trigger a slashing condition?
If those two scenarios need to be considered, attestations can be queried more quickly if block.attestations
and activeState.pending_attestations
were ordered by slot
and shard
.
@rawfalafel
Talked with V and Danny, recapping from Danny a little: the current spec doesn't prevent the proposer from accepting and including redundant (partial or full overlaps) across from different blocks of the same chain. It may happen but the proposer won't get the reward from including redundant attestation. (← it's not defined in the spec yet.) And in the state transition, we don't double count for FFG or rewards, so there is no incentive for the proposer to include full overlaps attestations. But we probably can prevent it with other mechanisms like a two-stage protocol that worthy to think about more. Right now we think ordering the attestations for binary search is too complicated for this purpose.
deactivate this discussion
I was thinking about how to implement and test the fork choice rule in the Python PoC codebase, hmm, it would be nicer if we have a database to rebuild/rollback the state instead of just having some dicts. 😬
The first thing occurred to me is
AttestationRecord
; the story I think is:AttestationRecord
s and then broadcast them.AttestationRecord
messages, creates a block, and broadcasts theBlock
.Block
, apply the state transition, and update localActiveState
. (Maybe alsoCrystallizedState
, up to the slot number and other conditions in the spec)post_active_state.pending_attestations = prev_active_state.pending_attestations + block.attestations
1 - How to store
AttestationRecord
andActiveState
?It makes sense to change
active_state.pending_attestations: [AttestationRecord]
toactive_state.pending_attestation_hashes: [Hash32]
if theAttestationRecord
have already been stored in the local DB.2 - How to store
Block
?For the DB storage design, we can consider treating the
AttestationRecord
like the PoW main chainTransaction
for avoiding double-storing from bothBlock
andActiveState
. That said, theBlock
consists ofBlockHeader
andAttestations
(i.e., BlockBody). There are those k-v pairs:attestation_hash
->AttestationRecord
LOOKUP_BLOCK_HEADER_<block_hash>
->BlockHeader
: the current block + potential proposer sig +attestation_root
-attestations
fieldLOOKUP_ATTESTATION_HASHES_<block_hash>
->[Hash32]
: get theattestation_hash
byblock_hash
active_state_root
-> newActiveState
(active_state.pending_attestation_hashes: [Hash32]
). When in the crystallized state transition that needspending_attestations
to updated crosslinks, just query theAttestationRecord
from the local DB.Haven’t dug into
CrystallizedState
yet.