Open hcnam opened 1 year ago
This is again the same situation as https://github.com/prysmaticlabs/prysm/issues/12884 and if this were raised in our repo, I would be inclined to close for the same reason. Leaking implementation details is one of the worse problems of an executable spec, which leads to a fake client diversity if Prysm (resp. Lighthouse) have Go (resp. Rust) interpretations of the same Python architecture. The current code in prysm deals with attestations pools to retry attestation validation and it's easily tested in unit tests.
However, in this particular scenario, we could handle test vectors where the attestations arrive with a sligh time skew, but most probably would require moving the granularity of the testvectors from seconds to milliseconds and it would force us to move from a localized forkchoice codepath testing to a more abrangent sync package and gossip paths testing. I would be against such a move: forkchoice spectests have been among the most useful tests the spec repo has given us, catching many consensus splitting bugs in our codebase, I feel like changing the scope of these tests would also render them less reliable.
In consensus-spec,
validate_on_attestation
asks to delay consideration for future attestations until their slot is in the past. However, it seems that the current spec test on fork choice does not handle it, because the spec does not define the deferring mechanism itself.In short, the current
consensus-spec
does not define the queueing order. Therefore, data structure discrepancies on deferred attestation have appeared; [Lighthouse, Nimbus, Lodestar] adoptqueue
, and [Prysm and Teku] usemap
for handling the future attestation.This is a minor and non-urgent discrepancy. Because, as you know, in the real world this may not be a big problem due to there being about 900K validators and a lot of slashers are watching it. Also, the order of arrival of the attestations will be unpredictable (except when packet orders are manipulated by a larger-scale network attacker). But, I want to suggest that the possibility exists that such a minor discrepancy like this may lead to real problems in the future.
However, there are problems to make this test impossible in current spec test implementations.
onAttestation
on PrysmStateless
onAttestation
on PrysmFirst problem is that
onAttestation
from Prysm is stateless. That is, Prysm handles future attestation scheduling outside of theonAttestation
function. This is acknowledged by the Prysm developers team in the sense that “As a stateless function (onAttestation), this does not hold nor delay attestation based on the spec descriptions.” Therefore, current testing tools on Prysm can not handle future attestations; They just reject it on current fork choice spec test tool. See scenario A below.Scenario A Assumption: No attestation except Attestation X and Y, or B and C has same vote. Steps:
Expect: head is B due to votes Result: Prysm decide with tie-break rule
Insufficient future attestation handling in Teku spec test
The issue stems from incomplete future attestation handling in Teku spec test. In Teku, it has an attestation scheduling so the future slot attestations can be deferred until their slot in the past. However, Teku's spec test environment does not fully support future attestations, i.e., it cannot defer attestations more than one slot. Teku assigns one of the two states for incoming future slot attestation. The first state is
DEFER_FOR_FORK_CHOICE
, indicating that the attestation should be handled in the next slot. The other state isSAVED_FOR_FUTURE
, which is used to save attestations that should be handled after several slots in the future. Teku's fork-choice testing environment supportsDEFER_FOR_FORK_CHOICE
state attestations but notSAVED_FOR_FUTURE
state attestations. This results in test cases with attestations attesting the future after the one slot failing in Teku tester. See scenario B below.Scenario B Assumption: No attestation except Attestation X and Y, or B and C has same vote. Steps:
Expect: head is B due to Teku can handle deferred attestation (later ignored) Result: Teku decide with tie-break rule
Different data structure for deferred attestation
This problem is data structure discrepancy between multiple clients. We’ve checked the several implementations, but they adopt different data structure for deferring future attestations: Lighthouse and Nimbus are using queue, and Prysm and Teku are using map to save early received attestations. Moreover, implementation of map data structure also has discrepancy; Prysm uses non-deterministic map, but Teku uses deterministic map structure. Due to the above discrepancy, if a slashable pair of attestation (like double voting two different block that shares the same parent block) enter
onAttestation
, Prysm makes a non-deterministic fork choice result. See scenario C below.Scenario C Assumption: No attestation except Attestation X and Y, or B and C has same vote. attester_slashing was not received yet. Steps:
Expect: head is B due to other use queue for deferred attestation (later ignored) Result: Teku deterministically makes the same decisions while Prysm nondeterministically makes different decisions (Note that the results from Teku and Prysm may be different from Lighthouse and Nimbus).
References: Lighthouse: queued_attestations Teku: deferredAttestations, futureAttestations Prysm: processAttestations, AttCaches Nimbus: queuedAttestations Lodestar: queuedAttestations
And related discussions: Teku#7805 - Data structure discrepancy on deferred attestation Teku#7804 - fork choice spec test seems not fully handling future attestation Prysm#12884 - Time moves during fork choice spec test