Open franck44 opened 4 years ago
state_transition
is also called by compute_new_state_root
which specifies how a validator should compute state_root. So, it also should be updated, if pre_state.slot < b.slot
moved to on_blocks
.
As for preventing DoS attacks, I think it's better done on the implementation level.
There are three kind of outcomes of on_block
(and on_attestation
): valid, invalid or 'early'. The latter happens when a block arrived before its slot (e.g. because of clock disparities) or before its parent (can be missed for some reason, but can arrive later via a different route or requested).
So, hash_tree_root(b) !in store.blocks
check would prevent DoS-attacks with blocks recognized as valid or invalid, however, it's a more tricky with 'early' blocks. E.g. one can send lots of blocks with some random parent_root
. However, they cannot be always immediately classified as valid/invalid, since a block, referenced by the parent_root
may arrive in future.
Thus, an implementer should maintain a buffer of blocks (and attestations), awaiting some other event to happen (a particular slot or a block with a particular root to arrive).
It also would be dangerous to mark 'early' blocks/attestation as invalid and be persistent in doing so (i.e. remember their status in some store), as message reordering is possible, and 'early' blocks/attestations which later become valid are quite possible.
So, handling of duplicate blocks (and attestations) is somewhat non-trivial in practice and may differ from implementation to implementation (e.g. 'early' blocks/attestation queues can have different sizes or a policy to choose a block/attestation to drop when such a queue becomes to big can differ).
Therefore, I believe adding DoS preventing checks would just complicate the specs and perhaps may restrict implementers in how they implement DoS counter-measures. However, it would be definitely useful to clarify such risks and aspects of block/attestation processing in a 'recommendation' section (like p2p-interface.md uses MAY to suggest a reasonable behavior).
In theory, it would be great to specify exactly how to handle various kinds of blocks and attestations in various circumstances, however, I expect it's extremely difficult to do and hardly ever possible to test/verify.
There is also one more tricky case. The fork choice allows a situation, when a block's root is in store.blocks
, but it's not in store.block_states
, e.g. if a block has an invalid signature.
This is an inconsistent state of Store
(I don't believe it's dangerous, though I think an attestation with target.root
referencing such invalid block may be accepted by fork choice in some cases).
In practice, implementers are likely to implement on_block
as a transaction, with 'Store' changes rolled back, if a block is invalid. So, the inconsistency won't be possible, in such case.
This is another reason I do not believe the problem is really dangerous (though I've opened an issue about that).
However, such transactional semantic of on_block
conflicts with the hash_tree_root(block) in store.blocks
check, if the block under consideration is invalid, as store updates should be rolled back.
@franck44: It is supposed to fix a problem "process without progress". It is unclear to me what it actually fixes.
Prior to v0.11.3, a malicious proposer for slot X
could make two blocks b1
and b2
, with b2.parent_root
as the root of b1
and b2.state_root
as the state root of state_transition(pre_state_slot_X, b1)
. This can cause a chain split between clients if they handle this situation in different ways. For example, one client may not detect this situation and apply both blocks in its chain, while another client may detect that the slot number is replayed and reject b2
. These two clients will now differ in their chain heads, leading to a chain split in the network.
v0.11.3 explicitly disallows this situation by forcing all clients to reject the second block with the same slot, preventing it from being applied in the same chain.
@franck44: So if I am correct, nothing prevents me from calling on_block(b) many times with the same block.
As far as the beacon chain fork choice is concerned, that is perfectly fine behavior. The purpose of the specification is to define the behavior of clients, and not exactly to provide an optimized version of a well-engineered client. This is in the interest of maintaining the simplicity of the specification.
Specifically in this case, since on_block(b)
is an idempotent operation, an optimized client can choose to not re-run on_block(b)
. Note that this still maintains the exact same behavior (w.r.t. the state
and store
) as the specification!
Prior to v0.11.3, a malicious proposer for slot X could make two blocks b1 and b2, with
b2.parent_root as the root of b1 and b2.state_root as the state root of state_transition(pre_state_slot_X, b1). This can cause a chain split between clients if they handle this situation in different ways. For example, one client may not detect this situation and apply both blocks in its chain, while another client may detect that the slot number is replayed and reject b2. These two clients will now differ in their chain heads, leading to a chain split in the network.
v0.11.3 explicitly disallows this situation by forcing all clients to reject the second block with the same slot, preventing it from being applied in the same chain.
Ok thanks @adiasg. This is an example of a scenario that is classified as bad, and the v0.11.3 is supposed to fix this particular problem. It would be nice to have a more logical specification of the properties the beacon chain should satisfy. The result of this fix is that the scenario you exemplified (with b1 ad b2) is not possible any more. But what property (or invariant) should be maintain? Is it that the beacon chain is always ... a chain [of blocks]? And if yes, we may try to formulate a logical statement that captures this property.
Specifically in this case, since on_block(b) is an idempotent operation, an optimized client can choose to not re-run on_block(b). Note that this still maintains the exact same behavior (w.r.t. the state and store) as the specification!
This seems a bit optimistic to me to delegate choices to clients: it is not obvious that playing the same block will result in the store not being modified. So clients should not rely on this, and probably not have infer that at all. The fact that it maintains the same behaviours is also not trivial at all (at least for me).
The specs are sometimes soft and would probably benefit from more rigour and formal treatment.
@adiasg
Prior to v0.11.3, a malicious proposer for slot X could make two blocks b1 and b2, with b2.parent_root as the root of b1 and b2.state_root as the state root of state_transition(pre_state_slot_X, b1). This can cause a chain split between clients if they handle this situation in different ways. For example, one client may not detect this situation and apply both blocks in its chain, while another client may detect that the slot number is replayed and reject b2. These two clients will now differ in their chain heads, leading to a chain split in the network.
v0.11.3 explicitly disallows this situation by forcing all clients to reject the second block with the same slot, preventing it from being applied in the same chain.
If I understand well, a desirable property of the chain is that, the slot numbers of the ancestors of any block, should form a total strict order (no two blocks with the same slot number).
Our work on a formal specification of the Eth2.0 specs may provide some insight into this problem. We have formally proved this property here.
This goes beyond the informal argument that such a situation may not occur after the fix of v0.11.3, as it provides a formal, logical, machine-checkable proof of a well-specified property (invariant).
@franck44 Thanks for the Dafny proof!
Formal verification of the spec is definitely important, and Runtime Verification has been helping us with this. In fact, the bug that v0.11.3 fixes was found through their verification efforts! The Coq and K models that have been produced will be made public when they are ready to be shared.
While it is desirable to have each and every modification to the spec to be incorporated and verified through a formal model, this introduces a large overhead on the spec writers. For this reason, the spec repository will usually be upstream from the formal models.
Thanks @adiasg.
Note that our proof does not only resolve the problem and scenario uncovered in the testing work by RV, but provide a more comprehensive statement (invariant) of what should hold, and a formal proof that it holds.
While it is desirable to have each and every modification to the spec to be incorporated and verified through a formal model, this introduces a large overhead on the spec writers. For this reason, the spec repository will usually be upstream from the formal models.
The purpose of our work is to discharge the spec writers of the hard part of the verification, which is figuring out the invariants, and building the proofs. We (as formal methods people) write the proofs, there is no need for the spec writers to write them.
Context
This commit adds a condition to guarantee that in
state_transition(state,block)
,state.slot < block.slot
. It is related to the spec update v0.11.3.It is supposed to fix a problem "process without progress". It is unclear to me what it actually fixes.
Progress with
state_transition
As far as I understand, the fix guarantees that, for a sequence of state_transitions, the slot number progresses. In other words it prevents executions like (I write
s - b -> s'
fors' = state_transition(s,b)
)s0 - b0 -> s1 - b1 -> s2 ...
with
b0.slot == b1.slot == b2.slot
.This seems to assume that progress is made by executing
state_transition
successively, and each time using the previously computed state.Progress with
on_block
However this does not seem to align with how
state_transition
is used in the Beacon chain. As far as I can tell,state_transition
is only called inon_block(b)
withb
a block. Inon_block(b)
, thepre_state
thatb
claims to attach to is computed. And laterstate_transition(pre_state,b)
is called.So if I am correct, nothing prevents me from calling
on_block(b)
many times with the same block. This will trigger a computation withstate_transition
to compute the successor state ofpre_state
withb
,state_transition(pre_state,b)
. Now assume the store isS
, andb
has already been added to the store (it is inS.blocks
). There seems to be a possibility of an infinite sequence ofon_block(b)
without the store being modified (becauseb
is already in):S - on_block(b) -> S - on_block(b) -> S .....
Security vulnerability
An exploit may be a DoS attack by sending the same block
b
many many times.How the issues can be mitigated
First the change in the commit that adds a condition to guarantee that in
state_transition(state,block)
,state.slot < block.slot
may be pushed upwards toon_block
by requesting thatpre_state.slot < b.slot
in the pre-conditions of this function. This avoids the needs for all theasserts
on theslot
number that are belowon_block
(a formal proof can be established using our Dafny specification of the eth2,0 specs).Second, to avoid the possibility of processing the same block again and again in
on_block
without progress on the store, we may require that the blockb
(or its hash) is not already in the store's keys, and add the pre-conditionto
on_block(b)
.How the issues was uncovered
Issue was uncovered while specifying and proving some invariants on the Dafny specification of the Eth2.0 specs.