ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.53k stars 953 forks source link

on_block(b) and (store) progress #1891

Open franck44 opened 4 years ago

franck44 commented 4 years ago

Context

This commit adds a condition to guarantee that in state_transition(state,block), state.slot < block.slot. It is related to the spec update v0.11.3.

It is supposed to fix a problem "process without progress". It is unclear to me what it actually fixes.

Progress with state_transition

As far as I understand, the fix guarantees that, for a sequence of state_transitions, the slot number progresses. In other words it prevents executions like (I write s - b -> s' for s' = state_transition(s,b))

s0 - b0 -> s1 - b1 -> s2 ...

with b0.slot == b1.slot == b2.slot.

This seems to assume that progress is made by executing state_transition successively, and each time using the previously computed state.

Progress with on_block

However this does not seem to align with how state_transition is used in the Beacon chain. As far as I can tell, state_transition is only called in on_block(b) with b a block. In on_block(b), the pre_state that b claims to attach to is computed. And later state_transition(pre_state,b) is called.

So if I am correct, nothing prevents me from calling on_block(b) many times with the same block. This will trigger a computation with state_transition to compute the successor state of pre_state with b , state_transition(pre_state,b). Now assume the store is S, and b has already been added to the store (it is in S.blocks). There seems to be a possibility of an infinite sequence of on_block(b) without the store being modified (because b is already in):

S - on_block(b) -> S - on_block(b) -> S .....

Security vulnerability

An exploit may be a DoS attack by sending the same block b many many times.

How the issues can be mitigated

First the change in the commit that adds a condition to guarantee that in state_transition(state,block), state.slot < block.slot may be pushed upwards to on_block by requesting that pre_state.slot < b.slot in the pre-conditions of this function. This avoids the needs for all the asserts on the slot number that are below on_block (a formal proof can be established using our Dafny specification of the eth2,0 specs).

Second, to avoid the possibility of processing the same block again and again in on_block without progress on the store, we may require that the block b (or its hash) is not already in the store's keys, and add the pre-condition

on_block(b)
   requires hash_tree_root(b) !in S

to on_block(b).

How the issues was uncovered

Issue was uncovered while specifying and proving some invariants on the Dafny specification of the Eth2.0 specs.

ericsson49 commented 4 years ago

state_transition is also called by compute_new_state_root which specifies how a validator should compute state_root. So, it also should be updated, if pre_state.slot < b.slot moved to on_blocks.

ericsson49 commented 4 years ago

As for preventing DoS attacks, I think it's better done on the implementation level.

There are three kind of outcomes of on_block (and on_attestation): valid, invalid or 'early'. The latter happens when a block arrived before its slot (e.g. because of clock disparities) or before its parent (can be missed for some reason, but can arrive later via a different route or requested).

So, hash_tree_root(b) !in store.blocks check would prevent DoS-attacks with blocks recognized as valid or invalid, however, it's a more tricky with 'early' blocks. E.g. one can send lots of blocks with some random parent_root. However, they cannot be always immediately classified as valid/invalid, since a block, referenced by the parent_root may arrive in future. Thus, an implementer should maintain a buffer of blocks (and attestations), awaiting some other event to happen (a particular slot or a block with a particular root to arrive). It also would be dangerous to mark 'early' blocks/attestation as invalid and be persistent in doing so (i.e. remember their status in some store), as message reordering is possible, and 'early' blocks/attestations which later become valid are quite possible.

So, handling of duplicate blocks (and attestations) is somewhat non-trivial in practice and may differ from implementation to implementation (e.g. 'early' blocks/attestation queues can have different sizes or a policy to choose a block/attestation to drop when such a queue becomes to big can differ).

Therefore, I believe adding DoS preventing checks would just complicate the specs and perhaps may restrict implementers in how they implement DoS counter-measures. However, it would be definitely useful to clarify such risks and aspects of block/attestation processing in a 'recommendation' section (like p2p-interface.md uses MAY to suggest a reasonable behavior).

In theory, it would be great to specify exactly how to handle various kinds of blocks and attestations in various circumstances, however, I expect it's extremely difficult to do and hardly ever possible to test/verify.

ericsson49 commented 4 years ago

There is also one more tricky case. The fork choice allows a situation, when a block's root is in store.blocks, but it's not in store.block_states, e.g. if a block has an invalid signature. This is an inconsistent state of Store (I don't believe it's dangerous, though I think an attestation with target.root referencing such invalid block may be accepted by fork choice in some cases). In practice, implementers are likely to implement on_block as a transaction, with 'Store' changes rolled back, if a block is invalid. So, the inconsistency won't be possible, in such case. This is another reason I do not believe the problem is really dangerous (though I've opened an issue about that).

However, such transactional semantic of on_block conflicts with the hash_tree_root(block) in store.blocks check, if the block under consideration is invalid, as store updates should be rolled back.

adiasg commented 4 years ago

@franck44: It is supposed to fix a problem "process without progress". It is unclear to me what it actually fixes.

Prior to v0.11.3, a malicious proposer for slot X could make two blocks b1 and b2, with b2.parent_root as the root of b1 and b2.state_root as the state root of state_transition(pre_state_slot_X, b1). This can cause a chain split between clients if they handle this situation in different ways. For example, one client may not detect this situation and apply both blocks in its chain, while another client may detect that the slot number is replayed and reject b2. These two clients will now differ in their chain heads, leading to a chain split in the network.

v0.11.3 explicitly disallows this situation by forcing all clients to reject the second block with the same slot, preventing it from being applied in the same chain.


@franck44: So if I am correct, nothing prevents me from calling on_block(b) many times with the same block.

As far as the beacon chain fork choice is concerned, that is perfectly fine behavior. The purpose of the specification is to define the behavior of clients, and not exactly to provide an optimized version of a well-engineered client. This is in the interest of maintaining the simplicity of the specification.

Specifically in this case, since on_block(b) is an idempotent operation, an optimized client can choose to not re-run on_block(b). Note that this still maintains the exact same behavior (w.r.t. the state and store) as the specification!

franck44 commented 4 years ago

Prior to v0.11.3, a malicious proposer for slot X could make two blocks b1 and b2, with

b2.parent_root as the root of b1 and b2.state_root as the state root of state_transition(pre_state_slot_X, b1). This can cause a chain split between clients if they handle this situation in different ways. For example, one client may not detect this situation and apply both blocks in its chain, while another client may detect that the slot number is replayed and reject b2. These two clients will now differ in their chain heads, leading to a chain split in the network.

v0.11.3 explicitly disallows this situation by forcing all clients to reject the second block with the same slot, preventing it from being applied in the same chain.

Ok thanks @adiasg. This is an example of a scenario that is classified as bad, and the v0.11.3 is supposed to fix this particular problem. It would be nice to have a more logical specification of the properties the beacon chain should satisfy. The result of this fix is that the scenario you exemplified (with b1 ad b2) is not possible any more. But what property (or invariant) should be maintain? Is it that the beacon chain is always ... a chain [of blocks]? And if yes, we may try to formulate a logical statement that captures this property.

Specifically in this case, since on_block(b) is an idempotent operation, an optimized client can choose to not re-run on_block(b). Note that this still maintains the exact same behavior (w.r.t. the state and store) as the specification!

This seems a bit optimistic to me to delegate choices to clients: it is not obvious that playing the same block will result in the store not being modified. So clients should not rely on this, and probably not have infer that at all. The fact that it maintains the same behaviours is also not trivial at all (at least for me).

The specs are sometimes soft and would probably benefit from more rigour and formal treatment.

franck44 commented 4 years ago

@adiasg

Prior to v0.11.3, a malicious proposer for slot X could make two blocks b1 and b2, with b2.parent_root as the root of b1 and b2.state_root as the state root of state_transition(pre_state_slot_X, b1). This can cause a chain split between clients if they handle this situation in different ways. For example, one client may not detect this situation and apply both blocks in its chain, while another client may detect that the slot number is replayed and reject b2. These two clients will now differ in their chain heads, leading to a chain split in the network.

v0.11.3 explicitly disallows this situation by forcing all clients to reject the second block with the same slot, preventing it from being applied in the same chain.

If I understand well, a desirable property of the chain is that, the slot numbers of the ancestors of any block, should form a total strict order (no two blocks with the same slot number).

Our work on a formal specification of the Eth2.0 specs may provide some insight into this problem. We have formally proved this property here.

This goes beyond the informal argument that such a situation may not occur after the fix of v0.11.3, as it provides a formal, logical, machine-checkable proof of a well-specified property (invariant).

adiasg commented 4 years ago

@franck44 Thanks for the Dafny proof!

Formal verification of the spec is definitely important, and Runtime Verification has been helping us with this. In fact, the bug that v0.11.3 fixes was found through their verification efforts! The Coq and K models that have been produced will be made public when they are ready to be shared.

While it is desirable to have each and every modification to the spec to be incorporated and verified through a formal model, this introduces a large overhead on the spec writers. For this reason, the spec repository will usually be upstream from the formal models.

franck44 commented 4 years ago

Thanks @adiasg.

Note that our proof does not only resolve the problem and scenario uncovered in the testing work by RV, but provide a more comprehensive statement (invariant) of what should hold, and a formal proof that it holds.

While it is desirable to have each and every modification to the spec to be incorporated and verified through a formal model, this introduces a large overhead on the spec writers. For this reason, the spec repository will usually be upstream from the formal models.

The purpose of our work is to discharge the spec writers of the hard part of the verification, which is figuring out the invariants, and building the proofs. We (as formal methods people) write the proofs, there is no need for the spec writers to write them.