Transition SSZ out of Eth2 specs repo, introduce new SSZ specs repo.

First of all: this does not change any existing SSZ functionality, this is a proposal to extend the information available, and move it out of the Eth2 specs repository.

The current SSZ specification in this repository is very dense, prune to bugs because of it, and most of all, incomplete with regards to the more advanced features.

There is a lot I want the SSZ spec to describe: principles, ideas, advanced topics, and more description for hardening certain decoding features. However, this seems like too much to fit into the Eth2 specs.

So instead, I have this draft up for an SSZ specs repository:

https://github.com/protolambda/eth2.0-ssz/ (yes, moving it out of a personal GitHub repo once we have a transition plan).

I also plan to include ssz-specific testing in this new repository as well, to better cover invalid SSZ data, and help SSZ implementations embed just the tests they need, as opposed to the full Eth2 suite of tests. (fastssz as standalone Go library being a good example)

Additionally, I am looking to do versioning differently, where features grow in stages, and feedback on style and wording can be merged into the specs more quickly for unstable stages.

Hopefully this also enables the formal verification team (cc @franck44, @booleanfunction) to express their findings better. I feel like any change in the current SSZ spec here confuses implementers more than it helps them, since the functionality is there, but the way it is written is difficult to follow, and having a debate whether or not something is a bug, or how/when something breaks, shows how limiting the spec format is.

So this is an open call for others involved in SSZ to give the draft repo a look, help iterate on it now that it's still an early draft and friction is lower, and to plan on a transition to this separate SSZ repository.

CC @wemeetagain @zah - since you have both spent a lot of time on more advanced SSZ implementations, you may be interested.

My two cents

IMHO what are the insufficient parts of the status-quo SSZ spec:

The historical problem
1. [Phase 0] simple-serialize.md: there was a pyspec-only ssz implementation before pyspec moved to use remerkleable. Some Python functions that the spec is referencing are from that implementation. py-ssz itself has some optimization so that py-ssz doesn't 100% follow the spec presentation too.
2. [Phase 1] merkle-proofs.md is untested: It's almost all Python codes to describe how it works. However, the codes haven't been tested in CI. And since we use remerkleable for pyspec, we may ignore the inconsistency between merkle-proofs.md and implementation.
Specification process and presentation
1. @franck44 I fully agreed that formal verification spec is valuable! But also, it's important to have a spec that devs and users can understand quickly. It troubles me by how difficult it is to have an SSZ spec that makes everyone happy, also as its name — it should be simple.
2. Agreed with making SSZ spec has its own versions. 👍 More importantly, I'd appreciate it to have a stable version of SSZ v1.0 when phase 0 launch that all the future features are compatible with v1.0. That's also why formal verification is important at this point. 🙂
3. IMO our old friend protobuf (hello Prysm) 's document is pretty clear for the devs and users: https://developers.google.com/protocol-buffers/docs/encoding

Documentation v.s. Formal verification spec

Please correct me if I'm wrong @franck44 @booleanfunction: Dafny has done the formal verification for simple-serialize.md but not the merkle-proofs.md?

And @protolambda, I think the encoding part of https://github.com/protolambda/eth2.0-ssz is not too different from simple-serialize.md?

As I can see, the most tricky part now is how to polish and describe multi-proofs algorithm properly. Luckily we still have time to iterate it until phase 1. IMHO it's also an argument of why it's good to rewrite merkle-proofs.md to a more understandable way.

Thanks for your proposals @protolambda and @hwwhww.

I think the important thing is to focus on what a specification should look like rather than where it is located. According to other comments in other issues, I am not confident we all agree on what a specification is. I can give an example here of what it is and how it can be used, from a formal methods perspective.

Specs vs implementation, and formal verification

A simple example of a spec is the fibonacci function, say fib. The specification of what fib(n) should be is defined by:

function fib (n: nat) : nat {
    if n == 0 then 0
    else if n == 1 then 1
    else fib (n - 1) + fib (n - 2)
}

This is the spec.It is a mathematical function, devoid of ambiguities.

A recursive implementation how to compute fib(n) follows directly from the definition. However it is inefficient and we may prefer a small memory footprint and a faster computation. For instance:

method ComputeFib (n: nat) returns (b : nat) {
    var i := 1;
    var a := 0;
    b := 1;
    if (n == 0) { return 0; }
     while (i < n) {
        a, b := b, a + b;
        i := i + 1;
    }
    return b;
}

This is an implementation. It is not clear at all (at first sight) that this method computes fib(n). There may be many other implementations, for instance one that caches the first say 50 fib(n) in a pre-computed array:

method ComputeFib2 (n: nat) returns (b : nat) {
    // an array caching the first 50 fib(n)
    var knownFib = [0, 1, 1, 2, 3, 5 ...   ]
    if (n < 50) { return knownFib[n]; }
    var i := 1;
    var a := 0;
     b := 1;
     while (i < n) {
        a, b := b, a + b;
        i := i + 1;
    }
   return b;
}

Note that the specification does not prescribe how to compute the result, and the developers are free to propose a way to compute the result, as long as they are correct.

Are the implementations correct? yes if every time we compute computeFib(n) the result is equal to fib(n), otherwise no. This is the (formal) correctness criteria. Note that hidden in the criteria is the fact that the algorithms terminate i.e.never end up in an infinite loop. If an implementation computes the correct result all the time according to a spec, we say that it satisfies the spec.

What is formal verification about? Mathematically proving using mathematical logic and inference rules that an implementation satisfies a specification. What is automated verification? Use an algorithm and possibly reasoning engines (e.g. SAT-solvers) to check (or assist you to check) that an implementation satisfies a specification.

To apply this to the Eth2 "specs", we had to somehow work as archeologist to figure out what a logical and functional specification is, in other words, what the intent was; and only then we can prove that an implementation (the Python-like current spec) satisfies a spec. We have some good examples how to do that for the SSZ, Merkleisation, and recently for the Beacon Chain (on_block). Our spec is formal, we have also provided implementations (that we can execute), and proved that our implementations satisfy their specifications. We can also establish non-trivial properties, .e.g. deserialise(serialise(x)) == x for all x (no testing but a formal proof and 100% guarantee).

About the readability by developers

When I read that developers don't want to read formal specs, I am a bit surprised. The non-formal specs are 1) imprecise, confusing and require a lot of work from the developers and 2) error-prone because different interpretations can arise, so they'll have to fix bugs.

A good historical example is undefined behaviours in C. Every undefined behaviour (e.g. division by zero) can be implemented in the C compiler as you wish. This has been creating problems for years in software written in C. Two compilers can result in different behaviours for the same program.

An advantage of a formal and proved specification is that the developers can understand it, it is not ambiguous. It seems to me a much cleaner approach than Python code which I don't think was intended as a specification language (at least we do not use it for that purpose in the formal methods' world).

The current Eth2 specs are a good starting point, and it is certainly a good way to start describing all those clever ideas. But at some point, we may use the tool of the trades, languages, tools and techniques to improve the readability of the specs (and disambiguate them).

We have shown our specs in Dafny to many colleagues who are not expert in formal specification and verification and they understand them. The ability to annotate the specs with proofs is invaluable and provides a degree of reliability that cannot be obtained by testing.

Moving forward.

So rather than writing yet another Python-spec that is ambiguous, not proved, why not fostering a formal version, with clear definitions, correctness proofs? And if this is a possible avenue, we have a candidate here.

It can easily accomodate future extensions like the merkle-proofs ( @booleanfunction is already investigating a formal treatment of the Merkle-proofs). It contains specifications, implementations and proofs, all in the same language, Dafny. Implementations in other mainstream languages can be obtained by compiling Dafny into C#, Go and soon Java.

Wall of text, sorry for that, I tried to go into detail about my concerns with the various approaches, and why it makes sense to plan a move to the new SSZ draft.

So here are the weaknesses I see of each approach:

Current py-spec

Python-isms sometimes leak into the spec language, I think for understanding of the spec this is not a big problem. But it does make it harder to port to another language. Pure functions would also help, but see later comments below about that.
Chance of ambiguity: most of the time there is none, but when python is mangled into regular text (like the merkleize function), edge cases are not tested and problems may arise.
Lack of explanation and guidance (not the same as simple comments). We tried to balance between python and regular markdown, went deep in the python direction to reduce ambiguity and have something that can both be tested and export tests. However, I recognize that there should be more explanation to help the reader. PR #1824 with comments is open, but I still think we should have more explanation/guidance in the spec eventually. This is easier when the spec-change churn is lower.
Python was never intended as specification language. It serves us well for various other things, but ultimately not the right language once spec is completely frozen.

Dafny

Only Dafny programmers can contribute effectively, the Eth2 spec is not just about formalism, but also about a process of having a ton of teams communicate their feedback to a project that pushes limits in various domains. We need accessibility in the spec while it still changes.
Although formally correct, it still has some of the drawbacks of the pyspec: Dafny on its own is not any more readable than Python (in my opinion), except that it can be shown to be correct (which is a different thing)
Correctness is important, but I think it fails to catch little things in-between the process. E.g. by changing a pack function to guarantee a single chunk output, the total resulting behavior may be correct, but it has nothing to do with pack. A spec should be built from ideas, described, then shown to be formally correct. Not the other way around.

More text, like new SSZ spec draft

Possibly lots of ambiguity in the smaller details
Cannot be executed, tested, or proven without a model.

And here are the benefits I see with each approach:

Current py-spec

Accessible to almost any programmer who is familiar with the domain. The python language subset that is used is kept minimal. No class definitions with methods, no inheritance, etc. Yes the pyspec can improve accessibility in other ways (more explanation), but that is being worked on.
Unit tests to guarantee some level of correctness, which at the same time can be exported to run in every client
Usable in lots of different tooling, interact with Eth2 without having to go through complexities of a full Eth2 client.

Dafny

Formal verification, once the spec is completely defined in Dafny
No ambiguity whatsoever
Good candidate for a specification, where "specification" means that of the behavior that we try to describe, not the various meta-problems the current "spec" is trying to solve (accessibility, implementer feedback loop, client testing).

More text, like new SSZ spec draft

The explanation and guidance implementers or readers in general need to make sense of the thing, and not just craft implementation that is correct, but does it in a way that makes sense in their context.
Converging on a shared open resource that does not get out of date as quickly. Existing efforts like the annotated spec and the Phase0-for-humans write-up are invaluable to onboarding implementers, but tend to get out of date quickly, since they are separate efforts.

On pure functions and the spec<>implementation trade-off:

I agree we could do a better job at separating the validation and mutation parts, possibly write every function as a pure function that returns a new modified copy of the state.

However, as with some other parts currently in the spec, that gets somewhat ignored by client implementers. The way the spec is implemented defines the behavior well, but is not capable to reach mainnet scale state size and transition speed without modifications.

Not to say we should not have more pure-functions, but what we should do is listen to client implementers, and keep the spec aware of the actual implementation. I have raised this issue before, and I'm not completely content with the current compromise, but I see the value of the current "spec" (or organization medium at this stage really) quickly diminish if we have to jump through extra hoops to compare and test a client implementation.

The spec is not where the network value will be, the clients are. The end-goal is not to show the spec satisfies correctness, the goal is that a client does, and the correct spec is a means to do that. If we only care about the spec, and stop right there, while making it more difficult for a client to verify against and keep in sync with the changes, we have not added any value.

All in all I don't think we have to choose between these, we should aim to have these approaches complement each other.

The ability to annotate the specs with proofs is invaluable and provides a degree of reliability that cannot be obtained by testing.

Agree, but annotation is a keyword for me here. If you would add the proofs to the other works, the result should be something that both provides guidance and explanation, can handle remaining churn of a spec, as well as a formal proof of the behavior, to resolve any possible ambiguity.

And given that SSZ functionality has not changed for a long time, the pyspec benefits are getting less, and we can focus on the explanation-with-proof part, with little to no churn of functionality changes in the candidate/finished stages of the spec. Thus the proposal to move to the new SSZ draft, which solves half of the problem, but where Dafny could start to complement the spec to solve the remainder of the problem.

For the remainder of the eth2-spec it is currently more complicated, as:

Clients have grown closer to the python spec
Testing is in place and important, especially during spec changes.
We need some parts of the spec to improve still, and phase-1 is being actively developed. To turn that all around is more difficult, involves talking with clients, and will be more of a long-term goal for now.

thanks @protolambda for your time and sharing your thoughts. I respectfully disagree with most of the conclusions and I think the reasoning you presented has some inconstancies.

Among them:

Python was never intended as specification language. It serves us well for various other things, but ultimately not the right language once spec is completely frozen.

I don't understand this argument. If Python is not the right language for this purpose, how can we express the spec in this language first, freeze it, and then move on? I don't think arguing about each and every single point is a very productive usage of our time. For now, I have presented my arguments in details, with the motivations, some scientific and logical argument, and I don't think I have anything to add.

All the best on your endeavour with the new SSZ spec.

@franck44 :

It seems to me a much cleaner approach than Python code which I don't think was intended as a specification language (at least we do not use it for that purpose in the formal methods' world).

Me in my summary of all the weaknesses/benefits, trying to get the picture:

Python was never intended as specification language

I don't understand this argument.

I am trying to see through yours, agree that Python is not optimal, and figure out a way forward. If you are reasoning the opposite, and do not want me to remove any python in favor of more explanation + future formal definitions, then make that clear.

how can we express the spec in this language first, freeze it, and then move on?

What I'm trying to say is that ultimately, when there is no functional difference in expressing the idea that has been iterated on endlessly, then adopting something more formal should be smooth. And we can do so before it is 100% frozen. Once things are frozen I'd like it to be better, that does not mean we cannot iterate and work towards it before it fully being frozen with format and everything. If before phase 0 launches (and being 100% frozen) we can have at least a paper or other resource going in-depth through the spec, with proofs for the important parts, then I rather have that than the python for spec purposes. Right now though, it is not the spec language that is the bottleneck for phase 0 stabilization, but the feedback loop to get through a mix of usability and less theoretical parts of the spec (networking in particular). SSZ on the other hand has not changed much, and I'm making the argument that there is room to transition it away from this specs repository, include the in-depth explanation that is missing, with room for formal definitions to complement it where possible.

on your endeavour

Please, I am not looking for drama, but a concrete plan to improve on the status quo, and some freedom to try things as part of the transition planning. If you dislike the python for a spec, but also don't see value in being less python-centric, while making room for more formal definitions of the types, then I don't understand. If anything, the SSZ part of the spec is the perfect experimentation ground to get the format right, without affecting ongoing spec work. The part of the spec itself is as stable as can be, many different formats may fit (I'm also thinking of using more encoding-specific languages to describe the format), and there has been a draft since forever that can be teared apart and polished before we actually finalize any decision. Now I wanted to give this a try, and instead we are arguing over the internet. If nobody is interested in a change of format, then I feel like I'm wasting my time here. And I will just go back to non-SSZ issues, and continue with the pyspec format that every once in a while gets a complaint about a python-ism, but at least ships spec improvements while we need them.

I would welcome an independently versioned SSZ spec with an isolated test suite. SSZ as a format has many useful properties and finding applications outside of the Ethereum spec is certainly possible.

I do have one significant concern for SSZ though. It would be quite difficult to coordinate an upgrade of the SSZ spec once Ethereum consensus objects start to appear in third-party databases, so we should do everything in our power to perfect the SSZ spec before the 1.0 release. To this end, I'm creating the following issue where some possible improvements can be discussed:

https://github.com/ethereum/eth2.0-specs/issues/1916

Otherwise, if I have to contribute to the spec-format discussion, my opinion is that nothing prevents a correct spec to be described in prose, in an accessible language (e.g. Python) and in a formally verifiable language (e.g. Dafny) at the same time.

If the the spec is correct and unambiguous, all definitions will agree on every input and output. The new ssz-spec can even host all of these descriptions. @protolambda's argument is that the different descriptions have different strengths and weaknesses at different times in the development process of the spec (ideas start as prose, they may be developed and tested out in Python and later formally verified in Dafny).

@protolambda @zah @CarlBeek

Otherwise, if I have to contribute to the spec-format discussion, my opinion is that nothing prevents a correct spec to be described in prose, in an accessible language (e.g. Python) and in a formally verifiable language (e.g. Dafny) at the same time.

Amazing. Formal methods' people may run away and trade their ETH for Cardano when they read that :-) I hope jokes are allowed here ... Another point is that we may argue based on scientific and informed argument rather than opinions.

I would encourage you to have a look at this.

I'll leave you to it (the new SSZ informal spec) but I really think it is not the best usage of our time.

You are skipping over the second part of the sentence where a language like Dafny is being considered. If we finally get to a point where we have multiple complete descriptions, I would always prefer the most formal one for correctness (especially if it has proven properties). But in the process of getting there, and to help readers understand the intention and background while they read the python, or any future Dafny code, I think it's meaningful to have more than one format. Then link them together, and warn the user that prose or python may not be as good as the formal description. Naming a spec prose "correct" at any point is dangerous yes, but it's completely valid to aim for (even though by definition it can't be perfect).

And what I want to avoid is a spec process where new features or improvements take a prohibitively high overhead to be developed. And preferably I want to allow as many readers as possible to understand why something was written like it is. I don't see a single-format spec achieve that. Maybe a brilliantly clear readable formal specification for a completely stable spec, but we're not there yet.

Otherwise, if I have to contribute to the spec-format discussion, my opinion is that nothing prevents a correct spec to be described in prose, in an accessible language (e.g. Python) and in a formally verifiable language (e.g. Dafny) at the same time.

I believe (prose + spec in some accessible language + formal spec) approach is inevitable. However, I would add important refinements, which I believe are crucial, in the sense that they shape long term efforts.

If the the spec is correct and unambiguous, all definitions will agree on every input and output.

There is a fundamental difference between a prose and a formal spec. Somewhat simplifying, one can formally prove that an implementation agrees with the formal spec on every input and output (under certain assumptions). However, the situation with proving that prose agrees with formal spec or spec in an accessible language is much more difficult. First, I can see no way to give formal semantics to pose. So, actually it will be an audit, manual inspection. And human make mistakes. Second, one can start adapting prose to make it more formal, but then we lose accessibility to regular programmers.

So, while in theory, human audit (manual 'proof checking') of agreement between prose + python spec + formal spec is possible in theory, I believe it is way too expensive.

Modern formal method suggest another approach - minimizing (or optimizing) Trusted Code Base. Basically, some parts have to be trusted and/or inspected manually, but we try to make this as small as reasonable. Else should be checked by computers.

So, I would formulate prose + python spec + formal spec approach in a somewhat different way:

prose is (very) important as a way to communicate to humans, as it pointed out before
but prose is super informal, so an implementation in some accessible language is (very) important too, as a more accurate way to communicate to programmers. Actually, I would say prose + py-spec is the beacon chain spec for implementers (in my humble opinion, based on my experience of working with several beacon chain implementations). Reference implementation is perhaps a better term for the py-spec. I will use it hereinafter.
Eventually, a formal beacon chain spec is expected to appear and I believe that several teams are working in that direction.

One can formally prove the formal spec agrees with the reference implementation on every input and output, though there are some important prerequisites for that. First, one need formal semantics for the language that the reference implementation is written in. The problem with Python is that it one of the worst languages in that respect. Though I don't think the situation is really bad. Basically, EF team writes the py-spec in a statically-typed subset of Python, with minor exceptions. Second, it uses a rather minimal set of Python features (as @protolambda has outlined before). I personally don't think it is a big problem to formalize the subset actually used by py-spec developers, especially assuming that EF is open to suggestions. I would rather say that a downside of the approach is that EF team may become more reluctant to port the py-spec to a language that is better suited to beacon chain spec development :). Though, as lots of efforts invested into the py-spec and tools (including tests) I believe it's a quite reasonable way. There is a formal tool for a subset of statically-typed Python 3, which is another reason why I believe py-spec can be kept as a reference implementation which can be formally verified against a formal spec, with some reasonable amount of efforts.

Another possible way in my opinion would be to translate the existing py-spec into a language more suitable for spec development at some point (I assume development of a reference implemenation, which continues to be the spec for implementers). By more suitable I mean a statically typed language presumably impure functional programming language (Rust, OCaml, Standard ML, F#, Scala, Kotlin, etc). As in my opinion, languages from the category fit best the current state of the spec. For example, manual translation of phase0/spec.py to Kotlin took about 7 hours (without further testing, cleanup, etc). I do not want to say this is the only possible way, rather my personal opinion, based on my experience of working with the pyspec and multiple programming languages. Some of the languages have pretty expressible type systems (e.g. Scala and OCaml allows use GADT). So, many bugs can be caught by a compiler. While I believe the spec translated in the language can be pretty readable and quite close to the py-spec (for example, phase0 in Kotlin), I believe it is true that it will be more difficult to contribute to the spec development.

So, it's probably not that bad variant, if one keeps prose + py-spec as a main beacon chain 'spec' for implementers. While formal method guys can make implementations along with necessary formalization of requirements, lemmas, invariants, post-condtions, pre-conditions, etc in their favorite language suitable for a formal work. There will be an inevitable duplication of efforts, though as one can see, the duplication serves different purposed (e.g. communication to humans vs. 'communication' to computers). However, some efforts required to reduce duplication of efforts between formal method teams are necessary). I will elaborate on this somewhere else, as I have some ideas but they are not yet really mature.

ethereum / consensus-specs