ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.53k stars 959 forks source link

Do not lock on execution when importing blocks #3577

Open potuz opened 8 months ago

potuz commented 8 months ago

When currently syncing blocks, the CL notifies the EL about a new payload, after checking the validity it puts the block in forkchoice and after that it notifies the EL again of a new head. This workflow made a lot of sense at the time of the merge, where increased forking and possible attacks from miners could lead to a situation where an optimistically syncing node would deadlock.

In the current merged network the above workflow is unnecessarily inefficient, locking both the CL and the EL in what should be independent checks. This proposal is about discussing mechanisms that will allow the CL and EL to perform these checks independently. One possible path would be the following path for block validation

  1. Validate the gossip rules for block propagation
  2. Notify the engine of the new payload (do not lock on this notification)
  3. Perform BLOCKHASH verification of the block within the CL.
  4. Insert the block directly into forkchoice and evaluate new head (allow forkchoice to keep track of which blocks have been fully validated vs those that were just notified, this can simply repurpose the current "optimistic" status of nodes).
  5. Notify the EL in case of a new head (again not locking).

The EL can immediately reply that a block is invalid if its parent is known to be invalid. The CL can retroactively change head to a valid one if either the call in 2 or 5 eventually return with invalid.

Obviously we need to change the way that duties are performed: proposals are on top of fully validated blocks, attestations the same. So perhaps the simplest form is to have get_head return the last fully validated block in the branch of the current head (or compute head filtering out all nodes not fully validated, the former seems simpler).

cc @mkalinin @gakonst @fradamt

adiasg commented 8 months ago

At first glance, it seems this proposal does not change the behavior of how the CL optimistically tracks blocks. That is, even with this change a CL with a functioning EL attached would return the same head block as today.

Can you explain the difference?

potuz commented 8 months ago

Thanks for the question, I'll add more context on the kinds of changes this would entail. There are some design decisions still to be made, and I am not strongly advocating for some of those decisions that I'll include below.

Currently the CL would not even considered for forkchoice a block unless the EL has returned from notify_NewPayload with either VALID, ACCEPTED or SYNCING. If the EL returns any of these values, then the CL imports the block for forkchoice and that block is treated, for the purposes of computing the current head, as any other block. That is, at any given time, the head returned from forkchoice could be a block that was imported optimistically (ACCEPTED or SYNCING replies above). In the even that a call to get_head returns an optimistic block, the node is said to be in optimistic mode and it cannot serve duties, that is, such a beacon will not provide a block to propose nor an attestation to sign.

On the other hand, if the EL does not return anything (for example if the EL is taking a long time to execute the block) or even errors out on an internal bug, the beacon does not import the block, and the current forkchoice is not changed. This node does not become optimistic, a call to get_head will return the previous head without changes and the beacon will happily provide blocks to propose on top of that previous head, or attestations to it.

This issue (or the accompanying PR if I get some traction) proposes to change the behavior so that even in the event that the EL is taking longer and does not reply, the block can be imported. It can be imported "optimistically" or can be imported with some extra annotation to show that the block was not validated by the EL. But the block will still count for forkchoice and therefore it will affect the result of get_head even in the second case.

The reason for this change is multi-purpose:

Therefore I would tentatively favour that validators that are syncing this way do perform duties on top of their last fully validated block, perhaps with some reasonable conditions like being that this block is a descendant of the current finalized or justified checkpoint.

djrtwo commented 8 months ago

the block can be imported. It can be imported "optimistically" or can be imported with some extra annotation to show that the block was not validated by the EL. But the block will still count for forkchoice and therefore it will affect the result of get_head even in the second case.

I don't think it's a problem to do imports in some limited/annotated fashion, but I do think it's potentially bad for it to be the return value of get_head -- what contexts would this surface in? Duties -- bad. Users? also seems bad -- the head is not actually a representation of what the has/sees so the information can not actually be utilized (safely or at all dpeending on what they are doing). This also seems like a bad thing to surface in relation to any sort of fast confirmation rule.

I certainly think that it's safe to parallelize maybe more than Prysm is doing so today, but with the caveat that it doesn't surface in the fork choice. But maybe that's the primary gain you are looking for.

Curious @paulhauner's take as he might have some of the optimistic/etc dangers more available in close recall.

potuz commented 8 months ago

what contexts would this surface in? Duties -- bad.

It's not clear to me what the actual danger here there is: for attestations, this shifts the semantics of the head vote from "this is the valid head of the chain" to "this is the head of the CL chain and it's a direct child of the valid head of the EL chain". For proposals, there's zero change, you can only propose on top of valid blocks.

There's also a direct incentive into proposing valid blocks, as your block is directly reorged if your execution turns out to be invalid.

Users? also seems bad

I don't think there's practical changes here as well: a call to the execution RPC will return data from the current EL head which is validated. Nothing has changed from the EL perspective at all. In the CL case, a call to the head block will return the current CL block at the tip, which will be guaranteed to have a validated parent and current valid consensus transition. It may also have an annotation to check if its payload is fully validated yet or not. I think nothing changes here except for this extra annotation.

The main differences here would be:

  1. The node is no longer considered optimistic if it's head is not validated, but rather if the parent is optimistic.
  2. There may be FFG problems if the self-reorging block justifies a checkpoint and turns out to be invalid.

I think we have solved all the problems with 2 with the new filtering rules (also note that the depth of attestations to the invalid block is bounded to 1 committee). And 1 I think it's not any danger because of the above considerations