Optimistic Sync tests proposal

mkalinin commented 2 years ago

Optimistic sync is an extension of consensus specs, but it seems valuable to feature consensus-spec-tests with a few test cases checking CL client's behaviour in complicated scenarios. Hive is the main testing tool that currently does this job. Debugging cross-layer (CL + EL) Hive test is difficult, writing Hive tests employing sophisticated re-org scenarios is even harder while both should be much easier if handled by consensus-spec-tests suite.

As optimistic sync is an opt-in backwards compatible extension, test format should follow the same principle. A subset of tests separated from the others and utilizing fork_choice test format seems like a reasonable approach. Clients that are not supporting optimistic sync (tho there is no such clients to date) may not support this subset at all.

It is proposed to extend fork choice tests format with PayloadStatusV1 responses which EL would return when newPayload or forkchoiceUpdated method is called for a corresponding block. CL clients that are implementing opti sync tests will have to mock EL clients accordingly.

Format extension for `on_block` handler

{
  block: block_{block_root},
  valid: true/false,
  payload_block_hash: block.body.execution_payload.block_hash, # payload hash is convenient for the EL mock
  payload_status: {status: "VALID" | "INVALID" | "SYNCING" | "ACCEPTED", latestValidHash: blockHash | null, validationError: null}
}

Should the PayloadStatus be ever updated in a backwards incompatible way these tests will have to be updated accordingly. Though, this is unlikely to happen outside of a hard fork on CL side, thus, tests may be aligned as well supporting the old status structure for previous hard forks.

Example scenario

B0 <- B1 <- B2 <- B3 
  \
    <- B1' <- B2' <- B3' <- B4'

1. Client imports [B0 ... B3], payload_status: {status: "VALID", ...} for each block in this branch
2. Client reorgs to and optimistically imports [B1' ... B3'], payload_status: {status: "SYNCING", ...} for each of this blocks
3. Client imports B4' with payload_status: {status: "INVALID", latestValidHash: B0.body.execution_payload.block_hash, ...}
4. head == B3

It is also proposed that valid flag for each of B1' ... B4' blocks is set to false indicating that these blocks are invalid allowing this test to result in the same outcome on CI and when run by a client without optimistic sync support. CL clients that do support optimistic sync should omit this flag and rely on payload_status instead.

cc @hwwhww @djrtwo @potuz @ajsutton @tersec @paulhauner

potuz commented 2 years ago

I strongly support the creation of these tests. I think though the format may be forced to be a little more complicated. When importing block A with a payload P, the proposed format is to include the execution status of P. There are typically two calls when importing a block. A call to new_Payload(P) that would be covered by this format. But also there's a call to notify_FockhoiceUpdated. This latter call may be triggered for a payload different than P. It can be for a previously imported payload that has since changed its execution status.

In fact there may be arbitrarily many calls to FCU, since if we are following a branch in a tree with N forks, that turns out to have N-1 branches with INVALID tip and only one VALID, if the LMD weights of the branches is unfortunate, we would have to call FCU N times until we find a viable head.

I thus propose that the format has a map {payloadHash: payload_status} where it's listed the status response that the mocked EL would give for calls to each of the payloads that may be involved. It may even be a good indication of an error if the CL calls the EL with a payload not in the map.

g11tech commented 2 years ago

+1 from lodestar

tersec commented 2 years ago

This is reasonable, and useful, as long as the tests are strongly distinguished from current consensus-spec tests which are are all

100% deterministic
100% self-contained (each test contains some completely defined starting conditions, some completely defined set of steps to perform, and some defined acceptance conditions)
do not require access to a network or other external resources; all steps required are part of internal state transition function or closely adjacent parts of a CL
depend only one the CL's code, no other software

The CL tests that exist are partly useful because they essentially never (as close to 0% as feasible -- sometimes, there are CI runner infrastructure failures, etc) create false-positive failures. Therefore, when I look at why a commit might have failed CI, I look at the consensus-spec-tests and if they're failing, something is almost certainly wrong with that commit. If a flakier test fails, well, time to look at what happened in more detail. As long as this change doesn't undermine that reasoning, I'm okay with it.

tersec commented 2 years ago

By "strongly distinguished": I mean, not just one more superficially similar-looking test in a folder of otherwise-self-contained-deterministic/etc test which has some new flag to treat it differently. These should be put elsewhere in the repo organization so that they can be reasonably run separately without continuously having to sort/filter which are the guaranteed reliable tests and which are the potentially flakier tests.

ajsutton commented 2 years ago

100% deterministic 100% self-contained (each test contains some completely defined starting conditions, some completely defined set of steps to perform, and some defined acceptance conditions) do not require access to a network or other external resources; all steps required are part of internal state transition function or closely adjacent parts of a CL depend only one the CL's code, no other software

I would have expected these tests to meet these conditions as long as you consider the test runner part of the CL code. The test runner is just being augmented to be able to return additional execution payload statuses - not just VALID or INVALID but also SYNCING and ACCEPTED and it can now return the latestValidHash value when the response is INVALID.

You wouldn't actually be running an EL for these tests.

tersec commented 1 year ago

100% deterministic 100% self-contained (each test contains some completely defined starting conditions, some completely defined set of steps to perform, and some defined acceptance conditions) do not require access to a network or other external resources; all steps required are part of internal state transition function or closely adjacent parts of a CL depend only one the CL's code, no other software

I would have expected these tests to meet these conditions as long as you consider the test runner part of the CL code. The test runner is just being augmented to be able to return additional execution payload statuses - not just VALID or INVALID but also SYNCING and ACCEPTED and it can now return the latestValidHash value when the response is INVALID.

You wouldn't actually be running an EL for these tests.

Sure, seems reasonable and useful.

potuz commented 1 year ago

The test runner is just being augmented to be able to return additional execution payload statuses - not just VALID or INVALID but also SYNCING and ACCEPTED and it can now return the latestValidHash value when the response is INVALID.

I've been wondering if it's not worth also to allow "timeout" to be a response. That's borderline implementation dependent more than spec-domain, but this is also a source of bugs if the CL acts on an optimistic head depending on the status when the EL timesout

hwwhww commented 1 year ago

Thank you @mkalinin for the proposal!

It also bothers me that there are no tests cover optimistic sync specs. 😅

payload_status: {status: "VALID" | "INVALID" | "SYNCING" | "ACCEPTED", latestValidHash: blockHash | null, validationError: null}

Is it correct to say, for the existing Bellatrix fork choice tests, clients take all payloads as VALID? That is, is it okay if I start by adding this to all existing Bellatrix tests:

payload_block_hash: block.body.execution_payload.block_hash,
payload_status: {status: "VALID", latestValidHash: blockHash, validationError: null}

And then write new tests of the invalid/syncing cases?

mkalinin commented 1 year ago

Thank you @mkalinin for the proposal!

It also bothers me that there are no tests cover optimistic sync specs. 😅

payload_status: {status: "VALID" | "INVALID" | "SYNCING" | "ACCEPTED", latestValidHash: blockHash | null, validationError: null}

Is it correct to say, for the existing Bellatrix fork choice tests, clients take all payloads as VALID? That is, is it okay if I start by adding this to all existing Bellatrix tests:
payload_block_hash: block.body.execution_payload.block_hash,
payload_status: {status: "VALID", latestValidHash: blockHash, validationError: null}
And then write new tests of the invalid/syncing cases?

I think we should rather leave fork_choice tests unaffected and create separate optimistic_sync test suite allowing for existing tests to run with no changes from the client side

mkalinin commented 1 year ago

I thus propose that the format has a map {payloadHash: payload_status} where it's listed the status response that the mocked EL would give for calls to each of the payloads that may be involved. It may even be a good indication of an error if the CL calls the EL with a payload not in the map.

I am not sure if we want to maintain this level of complexity. My gut is that we can get to any desirable state by adding one more block to a branch of a block tree and pass required payload status in it.

Having a specification of method calls and responses for each Engine API method might be valuable, currently we have newPayload and fcU but these set of methods may be updated in the future. Something like the following:

{
  payload_block_hash: block.body.execution_payload.block_hash,
  engine_api_responses: {
    "engine_newPayloadV1": {status: "VALID", latestValidHash: blockHash, validationError: null},
    "engine_forkchoiceUpdatedV1": {payloadStatus: {status: "VALID", latestValidHash: blockHash, validationError: null}, payloadId: null}
  }
}

This format has two properties: i) we can have different payload status returned by newPayload and fcU, e.g. ACCEPTED and VALID respectively ii) allows for simplified EL mocking leveraging full name of methods and fully specified responses

potuz commented 1 year ago

I am not sure if we want to maintain this level of complexity. My gut is that we can get to any desirable state by adding one more block to a branch of a block tree and pass required payload status in it.

The kind of situation that worries me about optimistic sync tests is if clients agree on head/justified information after a block is deemed INVALID. If we are syncing optimistically a tree like this, blocks arrived in order and they are all SYNCING

1 <-- 2 <--- 3 <--- 4 <---- 5 <--- 6
        \------7 <----8 <----9
         \      \------10
          \------------11 <--- 12 <---13

The last block is 13, and at this point the EL catches up and replies INVALID, LVH pointing to block 2. What is our Head? With the information in the current proposal the ONLY thing the CL can do is remove the branch 11 <-- 12 <-- 13. and choose between 6, 9, and 10 to be the new head by normal forkchoice rules.

But the issue here is that in runtime this is not all we do when we import block 13. The right answer to the question "what is our head?" depends on the optimistic status of the remaining blocks. If they are still optimistic the answer would be just as the previous package, but the EL has just catched up, so it could well be that the forkchoice head is 6 but the EL already knows it's INVALID. In fact all descending blocks from two may be INVALID except 10, which should be our head, regardless if it's optimistic or not. The way nodes find is out is by sending repeated FCUs until they find a viable or optimistic branch to follow.

I see several possible bugs here, among them a) a node does not call repeated FCUs, takes 6 as a head and tries to import a child of that block, but the EL already discarded 6 as INVALID and can't connect. b) The node does call repeated FCUs but does not update the optimistic status of the new head correctly. Works both ways it finds 10 as head and treats it as SYNCING still (since it was its status before) and therefore does not act on it because it thinks is optimistic, or the worse possible bug is that it treats it as VALID and acts on it when it may be optimistic. c) The node's algorithm to discard branches leaves the LMD weights in an incorrect status, leading to wrong heads upon multiple removals.

This is why I consider valuable to make sure we agree on behavior in these more complicated situations.

mkalinin commented 1 year ago

We may achieve the necessary level of flexibility by adding artificial on_payload_status step:

- {payload_status: {block_hash: '0x...', status: {status: 'VALID', latestValidHash: blockHash, validationError: null}}}

This step will need to precede the corresponding on_block step to let EL mock be ready for handling newPayload and fcU:

- {payload_status: {block_hash: '0x...', status: {status: 'VALID', latestValidHash: blockHash, validationError: null}}}
- {block: block_0x..., valid: true/false}

thoughts? cc @potuz @hwwhww

potuz commented 1 year ago

We may achieve the necessary level of flexibility by adding artificial on_payload_status step

This may work if these are interpreted not as current values returned from the EL but rather setting up the mock, so that when on_block step arrives, the EL returns those values. This makes it equivalent to the map in my post I think

mkalinin commented 1 year ago

We may achieve the necessary level of flexibility by adding artificial on_payload_status step

This may work if these are interpreted not as current values returned from the EL but rather setting up the mock, so that when on_block step arrives, the EL returns those values. This makes it equivalent to the map in my post I think

Yes, exactly this way. And it makes it easier to implement I guess and keep the format backwards compatible to the fork choice tests format. So, any changes to that format may be easily applied to optimistic sync tests as well

mkalinin commented 1 year ago

@hwwhww does the following format make sense to you:

- {payload_status: {block_hash: '0x...', status: {status: 'VALID', latestValidHash: blockHash, validationError: null}}}
- {block: block_0x..., valid: true/false}

hwwhww commented 1 year ago

@mkalinin To avoid payload_status.status, I'd suggest renaming payload_status to payload_info or so. Otherwise, it looks good to me.

mkalinin commented 1 year ago

@mkalinin To avoid payload_status.status, I'd suggest renaming payload_status to payload_info or so. Otherwise, it looks good to me.

I like payload_info more 👍 As we may easily extend it with more data in the future

mkalinin commented 1 year ago

Implemented in https://github.com/ethereum/consensus-specs/pull/2965 and https://github.com/ethereum/consensus-specs/pull/2982

ethereum / consensus-spec-tests