Open Grant Proposal: Yatima

Name of Project: Yatima

Proposal Category: devtools-libraries

Proposer: johnchandlerburnham

Technical Sponsor: porcuquine

Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT and APACHE2 licenses?: Yes

Project Description

Yatima is a dependently typed functional programming language for the decentralized web, featuring content-addressing via IPLD theorem-proving via the Lean Prover frontend[^radiya], and an on-chain backend via the Substrate[^substrate] blockchain framework. The mission of the project is to make mainstream the "proofs-as-programs" correspondence between formal mathematics and practical computation. We want to use proofs to make software safer, and software to make math more accessible. Formal proofs of software correctness are particulary relevant to the decentralized web, where it can be difficult or costly to correct errors after the fact.

Zero-knowledge proofs are are an important emerging technology of the decentralized web, which allow parties to prove to one another the result of a computation, without revealing anything other than that result. However, while current implementations of zero-knowledge proofs (such as zk-SNARKs) are in principle capable of verifying arbitrary computation, existing systems and tools are cumbersome, and difficult to use. Protocol Labs Research's Lurk language aims to solve this problem by using recursive SNARKs to create an expressive functional programming language that can efficiently compile to a zk-SNARK backend.

Our intention is to integrate Yatima with Lurk in order to allow for zero-knowledge verification of arbitrary compile-time computations, such as the typechecker of a theorem prover such as Lean. Formal proofs are expressive, but can be expensive to verify[^mathlib-verification]. With Yatima we can compile the verifier of these formal proofs to Lurk, and therefore to zk-SNARKs, enabling constant-time and constant-space verification of formal proofs. For example, we will, in principle, be able to generate a single zk-SNARK proof which verifies a specific commit of Lean's formalized mathematics library, mathlib[^constant-mathlib] Additionally, we intend to adapt existing work on functional WebAssembly (WASM) interpretation into Lean to allow Lurk to verify the results of computation expressed in WebAssembly. A WASM interpreter in Lean will enable the future development of formal proofs for the correctness of that interpreter, which will be useful for e.g. WASM integration on the Substrate runtime or Filecoin virtual machines.

Value

This project is an important step to enabling Filecoin to natively store and verify proofs and programs in the Lean Prover. Getting this right means that Filecoin has a shot at becoming the "chain-of-record" for the ongoing formalization of mathematics. Filecoin will also gain a powerful set of tools for enabling its smart contract developers to publish formal proofs on the correctness of their software.

This is an ambitious vision, and there are certainly some technical risks, however, we believe that these risks are largely centered around performance characteristics, rather than semantics. Fundamentally, our proposal combines existing systems which in principle should work together. In practice, we have some questions around how the stack of "Lean-on-WASM-on-Lurk-on-SNARKs" will behave. We have built some optionality into our design to address what we think will be the likely issues, and will describe this further in the milestones below.

That said, even with most pessimal possible outcome to those concerns, there are substantial secondary benefits to our deliverables. For example, our proposed IPLD and Multiformats libraries implemented in the Lean Prover will be valuable to Filecoin independently of the performance of our Lean-on-SNARKs stack. This is true of many of our other deliverables, such as our planned contributions to Lurk's Rust implementation.

Deliverables

The final deliverable for this proposal is to be able to verify substantial portions of the Lean standard library and the mathlib4 library in our our Lean kernel running on Lurk, content-addressed via IPLD. For example, we'd like to be able to verify, in a SNARK, that no natural number has a successor equal to itself:

lemma succ_ne_self : ∀ n : ℕ, succ n ≠ n
| 0,   h => absurd h (Nat.succ_ne_zero 0)
| n+1, h => succ_ne_self n (Nat.noConfusion h id)

Ideally we'd like to be able to do this for the entire standard library and all of mathlib4, but we want to hedge a little, since it's possible that some of Lean 4 kernel's optimizations (such as linking to the GNU Multiprecision Library for computations with large naturals) might cause some issues.

Development Roadmap

Milestone 1: IPLD AST Embedding

Duration: 10 weeks, 4 people

The Yatima language currently embeds the Abstract Syntax Trees (AST) of its expressions as IPLD (InterPlanetary Linked Data) objects. We can adapt this technique to both Lean and Lurk, enabling IPLD-compatible content-addressing and tree-based sharing of programs over IPFS and Filecoin. This is particularly important for sharing the specific inputs (e.g. source code, metadata, etc.) which are relevant to a given zk-SNARK proof.

We will implement an IPLD library in the Lean theorem prover. This will include modules for:
- unsigned-varint
- multihash (partial, not all hash functions will be supported)
- multibase
- CID
- IPLD's DAG-CBOR codec
We will embed the AST of the Lean kernel (specifically a representation called the Lean "export format") in IPLD, with implementations in both Lean and Rust.
We will implement Lean bindings to the Neptune library and the BLAKE3 library
Where possible, we will engage with the maintainers of the Rust IPLD and multiformats implementations to upstream the work in our Substrate-compatible sp-ipld library (and its dependencies).
We will write documentation and testing for the above.

Milestone 2: Lurk-rs tooling and improvments.

Duration: 10 weeks, 4 people

The Lurk-rs Rust implementation of the Lurk language can benefit from Yatima's programming language implementation work in several areas. Additionally, aligning dependencies between Yatima and Lurk for common programming language

We will implement a parser for Lurk-rs using the nom library
We will implement property testing for Lurk-rs using the quickcheck library
We will implement a CLI for Lurk-rs using the struct-opt library
We will implement a REPL for Lurk-rs using the rustyline library.
We will use our existing sp-ipld library to embed the Lurk AST in IPLD.
We will engage with the maintainers of the IPLD and Multiformats specifications to determine how best to incorporate zk-SNARK compatible hashes like Poseidon.
We will adapt Yatima's IPFS API client to work with the Lurk IPLD embedding from Milestone 2.5 so that Lurk programs can be stored and shared via go-ipfs
We will assist the Lurk team with general language development, with a specific focus on improving the efficiency of content-addressing during evaluation and witness-generation.
If possible, we will update Lurk-rs to build with no_std so that Lurk-rs can be compiled to deterministic WASM and run inside the browser (and eventually on-chain).
If 9 is possible without extensive dependency forking, we will adapt Yatima's web REPL for Lurk-rs, to allow Lurk programs to be executed and verified inside the browser.
We will write documentation and testing for the above.

Milestone 3: Radiya, A Lean kernel implemented in Rust with a Lurk Backend

Duration: 10 weeks, 4 people

Yatima's Radiya project is a Lean kernel (a typechecker for Lean programs) implemented in Rust, which is intended to be the new language core and type-theory for the Yatima language. Radiya is designed specifically for the upcoming Lean 4 version.

We will complete our implementation of Radiya, an independent type-checker for the Lean 4 kernel in Rust, such that Radiya can typecheck the Lean 4 standard library and mathlib.
We will implement embedding of Radiya into IPLD (overlaps with Milestone 1.2)
We will integrate Radiya into the primary Yatima language repository, integrating with current CLI, REPL and API tooling, particularly regarding IPFS support. We will publish the Radiya/Lean-core Yatima as Yatima 1.0.
We will adapt the Yatima language syntax into a legible textual format for Lean kernel programs (distinct from Lean source programs due to elaboration and inference).
We will adapt our previous work on a Scheme backend for Yatima into a compiler from Yatima 1.0 to Lurk.
We will write documentation and testing for the above.

Milestone 4: WASM.lean

Duration: 10 weeks, 4 people

In order to get verification of Lean programs in Lurk, we need an implementation of the Lean kernel in Lurk. Milestone 3 will allow us to compile Lean programs to Lurk, but unfortunately Lean 4 is not fully self-hosted yet. There are several ways around this problem, such as implementing our own Lean kernel in Lean or Lurk, but our favored approach is to simply implement a WASM interpreter in Lean, and compile this interpreter to Lurk. Since Radiya is implemented so as to compile to deterministic WASM, we will therefore get our Lean kernel implementation in Lurk "for free", as well as gaining the ability to verify any WASM program in Lurk, and formally prove the correctness of our WASM interpreter in Lean.

We will implement a Lean library for interpreting WASM programs, based loosely on haskell-wasm.
We will integrate this library with our IPLD libraries from Milestone 1.1 to enable content-addressing for the WASM AST (.wast).
We will compile this Wasm.lean library to Lurk, possibly with some hand-tuned modifications for efficiency (although this will potentially impact correctness of Lean proofs about Wasm.lean).
We will compile Radiya to WASM and run on Lurk via WASM.lurk

Should this approach prove impractical, we can either implement WASM.lurk by hand entirely (forgoing the possibility of Lean proofs of interpreter correctness) or we can implement our Lean kernel in Lean (but that is likely a substantial effort).

Total Budget Requested

Communicated to the team via Google Doc.

Maintenance and Upgrade Plans

We plan to use the libraries and tools we develop for this proposal as a starting point for many other exciting areas of future work:

Multisyntax

We hope to generalize our IPLD embedding of the Lean, Lurk and Yatima ASTs into a specification for embedding the syntax trees of any language into IPLD. One possibility is to add IPLD as a backend to the github/semantic project, however this raises some questions around what objects should be serialized directly and what objects should be linked via CIDs. It may well be that github/semantic's AST representation (which seem to rely heavily on the representation used by tree-sitter) is not ideal in this context.

This would allow FIL to act as a backend for tree-based DVCS or content-addressed build systems like Nix.

Deterministic WASM or dWASM

The WASM specification is non-deterministic in a small number of well-defined cases, including the specific bit-representation of NaN floating point values, as well as certain resource-exhaustion cases. This should be addressed in a new "deterministic WASM" or "dWASM" specification, which is compatible with, but stricter than ordinary WASM.

The wasmx/fizzy project lists the following goals in its README:

First class support for determistic applications (blockchain) Support canonical handling of floating point (i.e. NaNs stricter than in the spec) Support an efficient big integer API (256-bit and perhaps 384-bit) Support optional runtime metering in the interpreter Support enforcing a call depth bound Further restrictions of complexity (e.g. number of locals, number of function parameters, number of labels, etc.)

This is an excellent starting point, and likely could form the basis of a standard which could be used by FVM, eWASM, Substrate, etc.

A more exotic idea would be to replace or supersede IEEE-754 floats entirely with a naturally deterministic floating point implementation such as posits, where a 32-bit posit carries the same precision as a 64-bit IEEE-754 float. A Rust implementation of posits can be found here: https://gitlab.com/cerlane/SoftPosit

This would allow FIL smart contracts to do floating point math.

Electrolysis: Functional Purification of Rust into Lean

The kha/electrolysis project demonstrated how to formally verify Rust programs by transpiling the Rust MIR (Medium Intermediate Representation) into Lean. This project was implemented for Lean 2 and should be updated to Lean 4.

This would extend the Yatima/Lurk stack, and by extension FIL, to verify Rust programs.

Team

Team Members

Team Member LinkedIn Profiles

CVs provided via email

Team Website

https://github.com/yatima-inc

Relevant Experience

John and Gabriel have been working on designing functional programming languages together for over 2 years and previously coauthored the Formality language, which was supported by an Ethereum Foundation Grant (which John presented on at Devcon5)

Prior to that, John worked on Tezos where he created and led initial development of the Morley Framework, as well as creating the TZIP standards process. Before that, John cofounded tlon.io. He was a Thiel Fellow in 2011 and dropped out of Dartmouth College.

Gabriel previously worked at the Ethereum Foundation and received his Masters in Math at the Federal University of Bahia (Salvador, Brazil), where his dissertation was titled "On the Category of Deductive Systems".

Anders is a full stack and machine learning engineer by background, and received his Masters in Applied Physics and Mathematics from the Norwegian Institute of Science and Technology.

Sam recently graduated from the Rensselaer Polytechnic Institute with a degree in Computer Engineering. He has previously worked on embedded systems in Rust and C including industrial HVAC controllers at enverid.com.

Collectively, we recently completed a grant from the Web 3.0 Foundation to implement IPLD libraries and the Yatima language on the Substrate blockchain framework.

Team code repositories

[^radiya]: The current pre-alpha implementation of the Yatima language is implemented with a type-theory based on the Formality and Cedille theorem provers, but we are currently in the process of implementing a new kernel for Yatima called Radiya which is compatible with the Lean Prover kernel, and will be able to use the inference of Lean's ecosystem, such as libraries, type-inference and tactics.

[^substrate]: This backend was the subject of Yatima's recent Web 3.0 Foundation grant

[^mathlib-verification]: Reverifying all of Lean's mathlib takes a substantial amount of compute. For example, mathlib's continuous integration can take several hours with several hundred megabytes of memory (albeit on non-dedicated Github servers). Dedicated hardware with multiple threads can bring that down to several minutes

[^constant-mathlib]: In practice, for formal proofs we actually do want the intermediate states of programs, such as a theorem's specific type representation and the expression which proves it. This information can be attached to any zk-SNARK as metadata, for example via Filecoin/IPFS Content-ID (CID) hashes.

filecoin-project / devgrants