[SSZ] test vectors - Githubissues

mratsim commented 6 years ago

I think it would be useful to have language agnostic test-vectors. We can probably reuse yaml like what is proposed in https://github.com/ethereum/beacon_chain/issues/58 and https://notes.ethereum.org/s/r11GVSBuQ

Example (note: this is completely untested and may not be valid YAML, it's just to provide a first direction)

---

title: Sample SSZ (De)serialization test
summary: Test (de)serialization
test_suite: Simple_Serialization

fork: Istanbul    # Versioning
schemas:          # Schemas are predefined in specs, allow clients to check for type consistency
  - type0: {field0: uint32}
  - type1: {
      field0: "uint8",
      field1: "uint32",
      field2: "address",
      field3: "hash32",
      field4: "bytes"
    }

deserialized:
  - type0: {field0: 260}
  - int8: 5
  - type1: {
      field0: 5,
      field1: 4294967293, # 2^32 - 3
      field2: "0x0101...0101",
      field3: "0X0202...0202",
      field4: "0x03030303"
    }

serialized:                # should we use hex strings instead of an array?
  - type0: [0, 0, 1, 4]    # 300'u32 as an array of bytes
  - int8: [5]              # 5'u8 as an array of bytes
  - type1: [
      5,                   # 5'u8 as an array of bytes
      255, 255, 255, 253,  # 4294967293'u32 as an array of bytes
      1, 1, ..., 1, 1,     # address "0x0101...0101"
      2, 2, ..., 2, 2,     # hash32 "0X0202...0202"
      4, 3, 3, 3, 3        # length prefix + bytes sequence [3, 3, 3, 3]
    ]

Those can be stored alongside the spec in the https://github.com/ethereum/eth2.0-specs repo.

mratsim commented 6 years ago

An alternative organization

---

title: Sample SSZ (De)serialization test
summary: Test (de)serialization
test_suite: Simple_Serialization

fork: Istanbul    # Versioning

test_cases:
  - type0:
    - fields: {field0: "uint32"}
    - deserialized: {field0: 260}
    - serialized: [0, 0, 1, 4]

  - int8:
    - deserialized: 5
    - serialized: [5]

  - type1:
    - fields: {
        field0: "uint8",
        field1: "uint32",
        field2: "address",
        field3: "hash32",
        field4: "bytes"
      }
    - deserialized: {
        field0: 5,
        field1: 4294967293, # 2^32 - 3
        field2: "0x0101...0101",
        field3: "0X0202...0202",
        field4: "0x03030303"
    }
    - serialized: [
        5,                   # 5'u8 as an array of bytes
        255, 255, 255, 253,  # 4294967293'u32 as an array of bytes
        1, 1, ..., 1, 1,     # address "0x0101...0101"
        2, 2, ..., 2, 2,     # hash32 "0X0202...0202"
        4, 3, 3, 3, 3        # length prefix + bytes sequence [3, 3, 3, 3]
      ]

Edit: assuming we want to test multiple values per type:

---

title: Sample SSZ (De)serialization test
summary: Test (de)serialization
test_suite: Simple_Serialization

fork: Istanbul    # Versioning

test_cases:
  - type0:
    - fields: {field0: "uint32"}
    - tests:
      - test1:
        deserialized: {field0: 260}
        serialized: [0, 0, 1, 4]
      - test2:
        deserialized: {field0: 0}
        serialized: [0, 0, 0, 0]

zah commented 6 years ago

The YAML spec and libraries provide support for custom type annotations, so the simplest possible format is the following:

test case 1:
  in:
    field0: !int32 10
    field1: !address 0x2234781...
    field2: "text"
  out: "0x243242..."

test case 2:
  in: 10
  out: "0x0A"

In Python, for example, these type annotations are handled with custom functions registered in the yaml module before loading the test case:

import yaml

def yaml_address_constructor(loader, node):
    return parse_address(node.value)

yaml.add_constructor("!address", yaml_address_constructor)

test_case = yaml.load(...)

djrtwo commented 6 years ago

Although we would forgo some built-in support, I personally prefer @mratsim's format due to the ability to define a multi-field type and test multiple values per type.

Also, adding test_suite and fork works for me. We can add chain_test or something to the other casper/forkchoice test format.

djrtwo commented 6 years ago

@mratsim I modified the chain test format to conform to your proposed format with fork, test_suite and test_cases.

I think it generally makes sense to support multiple test_cases per eth2.0 test file regardless of the test_suite

https://notes.ethereum.org/s/r11GVSBuQ

terencechain commented 6 years ago

Would be nice to share and put all the test vectors for SSZ and fork choice under one repo

djrtwo commented 6 years ago

That is the intention @terenc3t. It will probably live at ethereum/eth2.0-tests and you will be able to add it into your repo as a submodule (just like eth1.0 clients do with the eth1.0 unified testing repo)

djrtwo commented 6 years ago

Here's the repo, but we don't have anything to put it in until we decide on some testing formats :) https://github.com/ethereum/eth2.0-tests

paulhauner commented 6 years ago

This looks good to me.

I have been generating some tests for the shuffling algorithm:

Draft YAML file: https://notes.ethereum.org/n7fyPi4cR-Gg9Ypq7ylTrQ
Code that produces the YAML: https://github.com/sigp/shuffling_sandbox/blob/51592e023276b67912cb0d1d4f82fe900c8598a2/sandbox.py#L119-L160

The code is a bit scrappy, I didn't want to spend too much time on it until the spec is finalized. Apologies to the reader.

When the format is more final, I'll complete and submit these shuffling vectors for PR :)

I have one suggestion:

Add a "version" field

I imagine fork is used for the case where the specification changes. A version field would be useful in the scenario where we fail to provide enough test cases and divergent behavior appears between clients. We would then want to modify the YAML file to include more comprehensive tests for the same specification.

I'm guessing the identity of the spec file would be: (test_suite, fork, version)?

djrtwo commented 6 years ago

@mratsim, Paul and I were discussing that in light of the tests he has generated, maybe all test cases don't deserve a name. Instead make the test cases a list and have an optional "name" and "description" field for any test case in any of the test_suites. This ensures that we don't have to spend time writing names for self explanatory tests and keeps the nesting a little more manageable.

This would change your example format to the following (with the option of leaving out the "name" field or adding a "description" field).

title: Sample SSZ (De)serialization test
summary: Test (de)serialization
test_suite: Simple_Serialization

fork: Istanbul    # Versioning

test_cases:
  - name: type0
    fields: {field0: "uint32"}
    tests:
      - test1:
        deserialized: {field0: 260}
        serialized: [0, 0, 1, 4]
      - test2:
        deserialized: {field0: 0}
        serialized: [0, 0, 0, 0]

Thoughts?

djrtwo commented 6 years ago

described the general format in more detail in this PR https://github.com/ethereum/eth2.0-specs/pull/39

ethereum / beacon_chain

[SSZ] test vectors #115

Add a "version" field