Closed mkalinin closed 1 year ago
I strongly support the creation of these tests. I think though the format may be forced to be a little more complicated. When importing block A
with a payload P
, the proposed format is to include the execution status of P
. There are typically two calls when importing a block. A call to new_Payload(P)
that would be covered by this format. But also there's a call to notify_FockhoiceUpdated
. This latter call may be triggered for a payload different than P
. It can be for a previously imported payload that has since changed its execution status.
In fact there may be arbitrarily many calls to FCU, since if we are following a branch in a tree with N
forks, that turns out to have N-1
branches with INVALID
tip and only one VALID
, if the LMD weights of the branches is unfortunate, we would have to call FCU N
times until we find a viable head.
I thus propose that the format has a map {payloadHash: payload_status}
where it's listed the status response that the mocked EL would give for calls to each of the payloads that may be involved. It may even be a good indication of an error if the CL calls the EL with a payload not in the map.
+1 from lodestar
This is reasonable, and useful, as long as the tests are strongly distinguished from current consensus-spec tests which are are all
The CL tests that exist are partly useful because they essentially never (as close to 0% as feasible -- sometimes, there are CI runner infrastructure failures, etc) create false-positive failures. Therefore, when I look at why a commit might have failed CI, I look at the consensus-spec-tests
and if they're failing, something is almost certainly wrong with that commit. If a flakier test fails, well, time to look at what happened in more detail. As long as this change doesn't undermine that reasoning, I'm okay with it.
By "strongly distinguished": I mean, not just one more superficially similar-looking test in a folder of otherwise-self-contained-deterministic/etc test which has some new flag to treat it differently. These should be put elsewhere in the repo organization so that they can be reasonably run separately without continuously having to sort/filter which are the guaranteed reliable tests and which are the potentially flakier tests.
100% deterministic 100% self-contained (each test contains some completely defined starting conditions, some completely defined set of steps to perform, and some defined acceptance conditions) do not require access to a network or other external resources; all steps required are part of internal state transition function or closely adjacent parts of a CL depend only one the CL's code, no other software
I would have expected these tests to meet these conditions as long as you consider the test runner part of the CL code. The test runner is just being augmented to be able to return additional execution payload statuses - not just VALID or INVALID but also SYNCING and ACCEPTED and it can now return the latestValidHash value when the response is INVALID.
You wouldn't actually be running an EL for these tests.
100% deterministic 100% self-contained (each test contains some completely defined starting conditions, some completely defined set of steps to perform, and some defined acceptance conditions) do not require access to a network or other external resources; all steps required are part of internal state transition function or closely adjacent parts of a CL depend only one the CL's code, no other software
I would have expected these tests to meet these conditions as long as you consider the test runner part of the CL code. The test runner is just being augmented to be able to return additional execution payload statuses - not just VALID or INVALID but also SYNCING and ACCEPTED and it can now return the latestValidHash value when the response is INVALID.
You wouldn't actually be running an EL for these tests.
Sure, seems reasonable and useful.
The test runner is just being augmented to be able to return additional execution payload statuses - not just VALID or INVALID but also SYNCING and ACCEPTED and it can now return the latestValidHash value when the response is INVALID.
I've been wondering if it's not worth also to allow "timeout" to be a response. That's borderline implementation dependent more than spec-domain, but this is also a source of bugs if the CL acts on an optimistic head depending on the status when the EL timesout
Thank you @mkalinin for the proposal!
It also bothers me that there are no tests cover optimistic sync specs. 😅
payload_status: {status: "VALID" | "INVALID" | "SYNCING" | "ACCEPTED", latestValidHash: blockHash | null, validationError: null}
Is it correct to say, for the existing Bellatrix fork choice tests, clients take all payloads as VALID
?
That is, is it okay if I start by adding this to all existing Bellatrix tests:
payload_block_hash: block.body.execution_payload.block_hash,
payload_status: {status: "VALID", latestValidHash: blockHash, validationError: null}
And then write new tests of the invalid/syncing cases?
Thank you @mkalinin for the proposal!
It also bothers me that there are no tests cover optimistic sync specs. 😅
payload_status: {status: "VALID" | "INVALID" | "SYNCING" | "ACCEPTED", latestValidHash: blockHash | null, validationError: null}
Is it correct to say, for the existing Bellatrix fork choice tests, clients take all payloads as
VALID
? That is, is it okay if I start by adding this to all existing Bellatrix tests:payload_block_hash: block.body.execution_payload.block_hash, payload_status: {status: "VALID", latestValidHash: blockHash, validationError: null}
And then write new tests of the invalid/syncing cases?
I think we should rather leave fork_choice
tests unaffected and create separate optimistic_sync
test suite allowing for existing tests to run with no changes from the client side
I thus propose that the format has a map {payloadHash: payload_status} where it's listed the status response that the mocked EL would give for calls to each of the payloads that may be involved. It may even be a good indication of an error if the CL calls the EL with a payload not in the map.
I am not sure if we want to maintain this level of complexity. My gut is that we can get to any desirable state by adding one more block to a branch of a block tree and pass required payload status in it.
Having a specification of method calls and responses for each Engine API method might be valuable, currently we have newPayload
and fcU
but these set of methods may be updated in the future. Something like the following:
{
payload_block_hash: block.body.execution_payload.block_hash,
engine_api_responses: {
"engine_newPayloadV1": {status: "VALID", latestValidHash: blockHash, validationError: null},
"engine_forkchoiceUpdatedV1": {payloadStatus: {status: "VALID", latestValidHash: blockHash, validationError: null}, payloadId: null}
}
}
This format has two properties: i) we can have different payload status returned by newPayload
and fcU
, e.g. ACCEPTED
and VALID
respectively ii) allows for simplified EL mocking leveraging full name of methods and fully specified responses
I am not sure if we want to maintain this level of complexity. My gut is that we can get to any desirable state by adding one more block to a branch of a block tree and pass required payload status in it.
The kind of situation that worries me about optimistic sync tests is if clients agree on head/justified information after a block is deemed INVALID
. If we are syncing optimistically a tree like this, blocks arrived in order and they are all SYNCING
1 <-- 2 <--- 3 <--- 4 <---- 5 <--- 6
\------7 <----8 <----9
\ \------10
\------------11 <--- 12 <---13
The last block is 13, and at this point the EL catches up and replies INVALID, LVH pointing to block 2. What is our Head?
With the information in the current proposal the ONLY thing the CL can do is remove the branch 11 <-- 12 <-- 13
. and choose between 6
, 9
, and 10
to be the new head by normal forkchoice rules.
But the issue here is that in runtime this is not all we do when we import block 13. The right answer to the question "what is our head?" depends on the optimistic status of the remaining blocks. If they are still optimistic the answer would be just as the previous package, but the EL has just catched up, so it could well be that the forkchoice head is 6
but the EL already knows it's INVALID. In fact all descending blocks from two may be INVALID except 10
, which should be our head, regardless if it's optimistic or not. The way nodes find is out is by sending repeated FCUs until they find a viable or optimistic branch to follow.
I see several possible bugs here, among them a) a node does not call repeated FCUs, takes 6 as a head and tries to import a child of that block, but the EL already discarded 6 as INVALID and can't connect. b) The node does call repeated FCUs but does not update the optimistic status of the new head correctly. Works both ways it finds 10 as head and treats it as SYNCING still (since it was its status before) and therefore does not act on it because it thinks is optimistic, or the worse possible bug is that it treats it as VALID and acts on it when it may be optimistic. c) The node's algorithm to discard branches leaves the LMD weights in an incorrect status, leading to wrong heads upon multiple removals.
This is why I consider valuable to make sure we agree on behavior in these more complicated situations.
We may achieve the necessary level of flexibility by adding artificial on_payload_status
step:
- {payload_status: {block_hash: '0x...', status: {status: 'VALID', latestValidHash: blockHash, validationError: null}}}
This step will need to precede the corresponding on_block
step to let EL mock be ready for handling newPayload
and fcU
:
- {payload_status: {block_hash: '0x...', status: {status: 'VALID', latestValidHash: blockHash, validationError: null}}}
- {block: block_0x..., valid: true/false}
thoughts? cc @potuz @hwwhww
We may achieve the necessary level of flexibility by adding artificial on_payload_status step
This may work if these are interpreted not as current values returned from the EL but rather setting up the mock, so that when on_block
step arrives, the EL returns those values. This makes it equivalent to the map in my post I think
We may achieve the necessary level of flexibility by adding artificial on_payload_status step
This may work if these are interpreted not as current values returned from the EL but rather setting up the mock, so that when
on_block
step arrives, the EL returns those values. This makes it equivalent to the map in my post I think
Yes, exactly this way. And it makes it easier to implement I guess and keep the format backwards compatible to the fork choice tests format. So, any changes to that format may be easily applied to optimistic sync tests as well
@hwwhww does the following format make sense to you:
- {payload_status: {block_hash: '0x...', status: {status: 'VALID', latestValidHash: blockHash, validationError: null}}}
- {block: block_0x..., valid: true/false}
@mkalinin
To avoid payload_status.status
, I'd suggest renaming payload_status
to payload_info
or so. Otherwise, it looks good to me.
@mkalinin To avoid
payload_status.status
, I'd suggest renamingpayload_status
topayload_info
or so. Otherwise, it looks good to me.
I like payload_info
more 👍 As we may easily extend it with more data in the future
Optimistic sync is an extension of consensus specs, but it seems valuable to feature consensus-spec-tests with a few test cases checking CL client's behaviour in complicated scenarios. Hive is the main testing tool that currently does this job. Debugging cross-layer (CL + EL) Hive test is difficult, writing Hive tests employing sophisticated re-org scenarios is even harder while both should be much easier if handled by consensus-spec-tests suite.
As optimistic sync is an opt-in backwards compatible extension, test format should follow the same principle. A subset of tests separated from the others and utilizing
fork_choice
test format seems like a reasonable approach. Clients that are not supporting optimistic sync (tho there is no such clients to date) may not support this subset at all.It is proposed to extend fork choice tests format with
PayloadStatusV1
responses which EL would return whennewPayload
orforkchoiceUpdated
method is called for a corresponding block. CL clients that are implementing opti sync tests will have to mock EL clients accordingly.Format extension for
on_block
handlerShould the
PayloadStatus
be ever updated in a backwards incompatible way these tests will have to be updated accordingly. Though, this is unlikely to happen outside of a hard fork on CL side, thus, tests may be aligned as well supporting the old status structure for previous hard forks.Example scenario
It is also proposed that
valid
flag for each ofB1' ... B4'
blocks is set tofalse
indicating that these blocks are invalid allowing this test to result in the same outcome on CI and when run by a client without optimistic sync support. CL clients that do support optimistic sync should omit this flag and rely onpayload_status
instead.cc @hwwhww @djrtwo @potuz @ajsutton @tersec @paulhauner