Fork-choice integration test format and example (WIP)

ericsson49 commented 4 years ago

The issue is opened to communicate the initial draft version of the proposed format for fork choice integration-style compatibility test suite. It's an early stage work in progress and will change. However, the main idea will stay the same. It is to be moved to a GitHub repo at some point, when it's mature enough.

There will be common files like spec constants and genesis states, but this is to be defined. There is a file for each test case (yaml format currently, but can be SSZ as well). Each test consists of test steps, each step being either an event (slot, block or attestation) or a condition to verify. There is also an initial state (should be spec constants plus genesis state, however the last one is lengthy, currently only amount of validators is specified in the file. Test file format:

---
# initial state currently generated from initial amount of deposits/validators
# and spec constants (external file)
validators: <InitialValidatorsCount>

# each test is a sequence of four step type: either one of three event kinds or a condition check
steps:
# new slot event
- !<slot>
  slot: <Slot>
# new block event ("standard" block serialization to yaml is used, as in the other tests)
- !<block>
  block: <Block>
# new attestation event ("standard" attestation serialization to yaml is used, as in the other tests)
- !<attestation>
  attestation: <Attestation>
# only head test condition currently, but this is the most important
# it means, that the root of the head block at the moment should be equal to the value specified
# other test conditions to be add in future
- !<check>
  checks:
    head: <Root>

A file with several examples attached Stale integration_tests_on_attestation.zip

New version fork_choice_integration_tests.zip

ericsson49 commented 4 years ago

The test scenario files are generated by augmenting our current Kotlin fork choice tests, the code is not part of a public repo yet. I'm currently splitting the tests into generator and executor parts, so that generator produce test cases and executor runs them. I'm also going to add invocation of the executable python spec code, so that "golden" answers are produced by the spec. And probably it makes sense to re-write the Kotlin generator code in Python as well. I will post a link to a repo, when the generator part is ready (working on it right now).

ericsson49 commented 4 years ago

I've pushed test executor and other test updates. https://github.com/harmony-dev/beacon-chain-java/tree/tests/fork-choice-integration-tests/qa/src/test/kotlin/qa

There are two Kotlin files with tests, one for block tests and another one for attestations. They can be run using IDEA.

When run, they will generate yaml test scenario files. Which can be run by test_executor.

Currently, we are using pseudo BLS signature implementation so that tests run faster. There are also other sources of incompatibilities (like file format extra fields), but I am working on fixing them.

ericsson49 commented 4 years ago

New test scenarios attached fork_choice_integration_tests.zip

ericsson49 commented 4 years ago

I've finally adapted fork choice tests so that their format is aligned with the existing eth2.0-spec-tests. Tests are for v0.10.1 tag - I have validated them with Teku, which currently supports v0.10.1. Tests are validated against v0.10.1 specification too. I've put tests here

There are problems to be resolved yet:

the main problem that I see with the format is that there are many files and directories created, which is bad for I/O
another problem is that there is a lot of duplication (e.g. genesis state files)
some files can be converted to SSZ, however, I believe, initially it's easier to work with yaml (easier to read). After everything is settled down, they can be converted

However, it closely matches the existing tests format, so, it will be easier to adopt.

In general, I would propose another format, where parts are assembled into one file. Another idea is to move common test parts to a common directory. Test format becomes more like a graph. However, it will occupy much less disk space and will require less I/O, which is important if many tests are to be generated.

protolambda commented 4 years ago

@ericsson49 Great, thanks for this work :).

Regarding the problems:

file size: agreed, I would like to change the spec tests in general to share parts of state, and reduce total size. Improving this for forkchoice tests would be good to keep it sustainable to add more forkchoice tests.
Duplication: same thing, we need to rethink the test format
Yaml to me is a legacy test format. Most clients use SSZ test inputs since it is more performant. I kept the YAML version of inputs around to keep tests readable. But once something like the viewer on simpleserialize.com works well it should be easy for everyone to just only have SSZ.

If we want to ship this fast I think it's best to provide states and blocks as .ssz as well, so that each client can enjoy the tests. But if you have more time to put into this, we could rethink the testing format to solve the size/duplication issues before releasing the tests.

ericsson49 commented 4 years ago

@protolambda Thanks for the comments!

I do want to put more time. I want to develop or generate more tests to improve coverage and disk space. Most tests do not require BLS and so can be run pretty fast. So the main problem is disk usage. I will convert tests to a new format then in near future.

ericsson49 commented 4 years ago

@protolambda I've converted tests to a new format and put in a new branch.

Two main ideas:

one file with test description per each test case. So a suite is a list of files, instead of a directory.
keep common tests parts in a common directory, which I call cache. A test description contains references to the parts instead of the parts themselves (a path to file currently, i.e. ../cache/block_0.yaml).

At the moment, parts are blocks, states and attestations. But they can be any top-level containers, defined by eth2.0 specs. Sub-parts like AttestationData will be more difficult to handle.

Test description is a dict. For example, for fork choice tests there are following keys:

meta - can contain bls_setting, for example
genesis
steps - a list of steps, which are applied sequentially:
- 'slot', 'block', 'attestation', 'checks'

An additional optimization is that cache directory can archived and compressed. It saves disc space considerably. Client impelementers can unpack it, if they like or load the whole archive in memory (should probably change archive format to tar.gz or zip).

The link above illustrates the ideas. The storage savings are significant:

previous format requires about 1.8 Mbytes data for 35 tests. Mostly, because genesis.yaml is duplicated
the new format requires 225Kbytes, where cached files occupies 174Kbytes and 51K tests themselves
one can compress cache, then it shrinks down to 23-25K

As a result, there is about 1.5K data per test on average, plus cached stuff, which is additionally 1K or 5K on average, depending on whether cache is compressed or not. Cache size is likely to grow sublinearly when new tests are added.

djrtwo commented 4 years ago

Are you hand specifying the tests to ensure reuse of the cache or do you have a method to deduplicate given a set of tests?

ericsson49 commented 4 years ago

I record events during a test run, and when converting the trace to a file, I'm using a de-dup cache. It's not an external procedure currently.

ericsson49 commented 4 years ago

In general, I'm thinking about three-four levels of de-duplication:

test parts level, where a part is a top-level container (BeaconState, SignedBeaconBlock, Attestation, etc)
in the case of fork choice tests (and network tests, in general), one can re-use test prefixes, i.e. several test cases may start with the same genesis state (I'm re-using genesis state right now) and apply a number of the same block, slot and attestation events in the same order. So, test cases form a forest, in general - a set of trees, encoding common prefixes of test events.
when generating test cases (which I'm working on currently), it's easy to get duplicate cases - or very similar ones (e.g. having the same structure)
test cases can be different, but they may be redundant since they testing the same execution paths, i.e. they do not improve target coverage metrics. One can model it with a set of predicates. So, each test case is assigned with a map, indicating which predicates hold for the test case and which not. So, the idea is ti recognize a test case as a duplicate, if we already have a test case covering the same combination of predicates being true/false.

The proposed test format solves the first issue. The second one may be interesting, since implementations can cache Store instances and thus, avoid, re-applying same sequences again and again.

ericsson49 commented 4 years ago

I've generated a version fork choice tests for v012x branch and put them here.

Code to generate and to run the tests against pyspec phase0 are put into the branch.

The tests currently do not support optional checks. Optional checks may be required, if some client do not buffer early blocks/attestations and just drop them. Currently, the tests assume such messages are buffered and re-inserted when appropriate (e.g. current slot incremented or a referenced block has arrived). Therefore, such checks can be made optional, to support variations in implementations, though I personally believe it's not a good idea.

My priority has been to agree on the fork choice test format and release some official fork choice integration tests first. Generating lots of tests has not been a priority, since it can be somewhat painful to run and debug fork choice tests. Therefore, it's better to keep the test suite size reasonably small for the beginning. I can generate more tests, after everything is settled with the initial version.

Test generation code also does not yet conform to beacon chain test generators rules and guidelines.

ericsson49 commented 4 years ago

I've created a version of fork choice test format, which allows optional checks, e.g.:

block_in_store:
  value: '0x654dd99167c3b155924aad2a19086a70f89daa2b0cb86effb07a3e0488c5669c'
  optional: false

instead of just

block_in_store: '0x654dd99167c3b155924aad2a19086a70f89daa2b0cb86effb07a3e0488c5669c'

Tests are put here and code here

ericsson49 commented 4 years ago

I've updated tests to match v0.12.2.

Tests are here and here (optional checks). Code to generate the tests is here and here (optional checks).

terencechain commented 4 years ago

I've updated tests to match v0.12.2.

Tests are here and here (optional checks). Code to generate the tests is here and here (optional checks).

Thanks! This is great work. Are we planning to merge these into the official eth2.0-spec-tests repo?

ericsson49 commented 4 years ago

Thanks! This is great work. Are we planning to merge these into the official eth2.0-spec-tests repo? Thanks! They were supposed to be merged, but I do not know current EF plans. @djrtwo could you please clarify?

ethereum / consensus-spec-tests

Fork-choice integration test format and example (WIP) #17