ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.57k stars 973 forks source link

Consider adding more details for the attestation deferring mechanism #3498

Open hcnam opened 1 year ago

hcnam commented 1 year ago

In consensus-spec, validate_on_attestation asks to delay consideration for future attestations until their slot is in the past. However, it seems that the current spec test on fork choice does not handle it, because the spec does not define the deferring mechanism itself.

# Attestations can only affect the fork choice of subsequent slots.
# Delay consideration in the fork choice until their slot is in the past.
    assert get_current_slot(store) >= attestation.data.slot + 1

In short, the current consensus-spec does not define the queueing order. Therefore, data structure discrepancies on deferred attestation have appeared; [Lighthouse, Nimbus, Lodestar] adopt queue, and [Prysm and Teku] use map for handling the future attestation.

This is a minor and non-urgent discrepancy. Because, as you know, in the real world this may not be a big problem due to there being about 900K validators and a lot of slashers are watching it. Also, the order of arrival of the attestations will be unpredictable (except when packet orders are manipulated by a larger-scale network attacker). But, I want to suggest that the possibility exists that such a minor discrepancy like this may lead to real problems in the future.

However, there are problems to make this test impossible in current spec test implementations.

Stateless onAttestation on Prysm

First problem is that onAttestation from Prysm is stateless. That is, Prysm handles future attestation scheduling outside of the onAttestation function. This is acknowledged by the Prysm developers team in the sense that “As a stateless function (onAttestation), this does not hold nor delay attestation based on the spec descriptions.” Therefore, current testing tools on Prysm can not handle future attestations; They just reject it on current fork choice spec test tool. See scenario A below.

Scenario A Assumption: No attestation except Attestation X and Y, or B and C has same vote. Steps:

 N      N+1     N+2
[A]  –  [B]  –  [ ] 
     \  [ ]  –  [C] 

Slot N   : Block A
Slot N+1 : Block B (parent: A)
Slot N+2 : Block C (parent also A)
           Receive Attestation_0 from validator X of slot N+3 (attesting block B)
Slot N+3 : Check head

Expect: head is B due to votes Result: Prysm decide with tie-break rule

Insufficient future attestation handling in Teku spec test

The issue stems from incomplete future attestation handling in Teku spec test. In Teku, it has an attestation scheduling so the future slot attestations can be deferred until their slot in the past. However, Teku's spec test environment does not fully support future attestations, i.e., it cannot defer attestations more than one slot. Teku assigns one of the two states for incoming future slot attestation. The first state is DEFER_FOR_FORK_CHOICE, indicating that the attestation should be handled in the next slot. The other state is SAVED_FOR_FUTURE, which is used to save attestations that should be handled after several slots in the future. Teku's fork-choice testing environment supports DEFER_FOR_FORK_CHOICE state attestations but not SAVED_FOR_FUTURE state attestations. This results in test cases with attestations attesting the future after the one slot failing in Teku tester. See scenario B below.

Scenario B Assumption: No attestation except Attestation X and Y, or B and C has same vote. Steps:

 N      N+1     N+2     N+3  
[A]  –  [B]  –  [ ]  –  [ ] 
     \  [ ]  –  [C]  –  [ ] 

Slot N   : Block A
Slot N+1 : Block B (parent: A)
Slot N+2 : Block C (parent also A)
           Receive Attestation_0 from validator X of slot N+4 (attesting block B)
Slot N+4 : Check head

Expect: head is B due to Teku can handle deferred attestation (later ignored) Result: Teku decide with tie-break rule

Different data structure for deferred attestation

This problem is data structure discrepancy between multiple clients. We’ve checked the several implementations, but they adopt different data structure for deferring future attestations: Lighthouse and Nimbus are using queue, and Prysm and Teku are using map to save early received attestations. Moreover, implementation of map data structure also has discrepancy; Prysm uses non-deterministic map, but Teku uses deterministic map structure. Due to the above discrepancy, if a slashable pair of attestation (like double voting two different block that shares the same parent block) enter onAttestation, Prysm makes a non-deterministic fork choice result. See scenario C below.

Scenario C Assumption: No attestation except Attestation X and Y, or B and C has same vote. attester_slashing was not received yet. Steps:

 N      N+1     N+2  
[A]  –  [B]  –  [ ] 
     \  [ ]  –  [C] 

Slot N   : Block A
Slot N+1 : Block B (parent: A)
Slot N+2 : Block C (parent also A)
           Receive Attestation_0 from validator X of slot N+3 (attesting block B)
           Receive Attestation_1 from validator X of slot N+3 (attesting block C)
Slot N+3 : Check head

Expect: head is B due to other use queue for deferred attestation (later ignored) Result: Teku deterministically makes the same decisions while Prysm nondeterministically makes different decisions (Note that the results from Teku and Prysm may be different from Lighthouse and Nimbus).

References: Lighthouse: queued_attestations Teku: deferredAttestations, futureAttestations Prysm: processAttestations, AttCaches Nimbus: queuedAttestations Lodestar: queuedAttestations

And related discussions: Teku#7805 - Data structure discrepancy on deferred attestation Teku#7804 - fork choice spec test seems not fully handling future attestation Prysm#12884 - Time moves during fork choice spec test

potuz commented 1 year ago

This is again the same situation as https://github.com/prysmaticlabs/prysm/issues/12884 and if this were raised in our repo, I would be inclined to close for the same reason. Leaking implementation details is one of the worse problems of an executable spec, which leads to a fake client diversity if Prysm (resp. Lighthouse) have Go (resp. Rust) interpretations of the same Python architecture. The current code in prysm deals with attestations pools to retry attestation validation and it's easily tested in unit tests.

However, in this particular scenario, we could handle test vectors where the attestations arrive with a sligh time skew, but most probably would require moving the granularity of the testvectors from seconds to milliseconds and it would force us to move from a localized forkchoice codepath testing to a more abrangent sync package and gossip paths testing. I would be against such a move: forkchoice spectests have been among the most useful tests the spec repo has given us, catching many consensus splitting bugs in our codebase, I feel like changing the scope of these tests would also render them less reliable.