OpenZeppelin / merkle-tree

A JavaScript library to generate merkle trees and merkle proofs.
MIT License
443 stars 107 forks source link

Merkle tree/proof for data exchange #48

Closed lukepuplett closed 3 months ago

lukepuplett commented 3 months ago

Hello

I've been chatting on the Ethereum Attestation Service's Telegram channel about my desire to design a JSON structure which I can give to users as the actual data behind their attestation (of the root hash, on chain).

This JSON would be useless to a non-technical user but I'd hope that as momentum builds around blockchain-aware apps, it would be used by other apps as proof of some action or fact.

Developers will have to build that functionality into all the other apps, and the wider developer community, who may be only dipping their toes into this rather arcane world and perhaps even resistant to this work, would need all the help they can get.

I'm interested in designing a Merkle tree structure in JSON which can be

  1. Used as a format for data exchange; it is both the data and the proof and nothing else is needed.
  2. Very easily understood so that the developer can look at it, read-up on how Merkle trees generally work, and write code in their own language to read the data, recompute the root hash and compare it against a signed (on-chain) copy, ideally without use of any specialised libraries except perhaps a hasher and with as few new esoteric concepts as possible (strange ABI-encoding vs. well-known web or JS standards). A bonus would be that an LLM could also understand the JSON and write this code.

The goal (1) for data exchange would require that byte arrays be paired with a content-type field. For example, if the bytes were a selfie photo, then the MIME type of that encoding would be needed for it to be useful. "This photo is a likeness of the person I video conferenced, and that data is hashed and signed along with their passport details." And if the bytes were plain text, then the encoding would be needed, e.g. UTF-8 in Base-64.

The goal (2) I think would entail a structure that trades compactness for screaming structure, "oh, it actually resembles a Merkle tree", where the leaf nodes and all intermediate nodes with hashes would be included, hierarchically. The hashing algorithm and salt bytes would also need to be included so that the JSON has everything needed to for the developer to "fall into the pit of success" when writing code to process it.

What are your thoughts?

Amxx commented 3 months ago

Hello @lukepuplett

First of all, I don't think this is the right place for this discussion. This repo is about features request / bug tracking that are specific to this library. Your should probably discuss that on places like stackoverflow or the OpenZeppelin forum.

About your question:

Merkle tree are designed so that a sub portion of the data can be proven to be part of the tree easily. The idea is that by commiting a simple hash (reference) you can prove something is in the data, without showing the data. If you don't need that property, then merkle tree are probably not the right structure for you.

Think of how software packages (or .iso are distributed). They don't use Merkle tree. They use normal hashes. As a user, you retrieve all the data/package, you hash it, and you check that the checksum is correct.

The first question you should ask yoruself is "who needs to prove what? in what context? with what trust assumption?". Only once the issue you want to solve is clear can you hope to get the right technical solution for it.

lukepuplett commented 3 months ago

Thanks for your prompt response. StackOverflow is the wrong place for any kind of discussion, but perhaps I'll move this to the forum.

The odd thing about posting this kind of discussion is that everyone wants to bat it away somewhere else. Where does anyone go to establish a new data standard?

Regarding the use case, suppose I have someone's passport and I've personally verified that some wallet owner is the holder of the passport and I want to attest to this fact. I have a set of data such as name, nationality, passport number and a JPEG of their passport photo page.

I produce a Merkle tree of this data and I attest the root hash on-chain using the passport holder's wallet, then my app/service attests to that attestation to say that this data has been checked. My app/service is itself attested to by a government agency. We now have a strong chain of attestation of some important privately-held data.

The passport holder will need to download a single file of the actual data and all the hashes which roll-up to the root hash which matches what's on-chain and is attested to by my KYC company.

They can now present this file which holds their digital passport to some other service who can use that data, decode and look at the photo, and also recompute the hashes and check it has been attested.

But I could also produce a redacted version (or the user could edit the file and delete the data field(s)) so only e.g. their date-of-birth is in the clear, and now they could use this evidence to get access to adult content without revealing who they are.

In this example, the JSON document is both the actual data and the proof of it having been checked. That's the goal, to create a file format for this purpose.

It could potentially be an important new data format.

Amxx commented 3 months ago

Where does anyone go to establish a new data standard?

Not in the repo of a library that implement a specific datastructure.

lukepuplett commented 3 months ago

Right. Thanks.

frangio commented 3 months ago

Where does anyone go to establish a new data standard?

An ERC could be a good fit.

lukepuplett commented 3 months ago

Thanks frangio, I appreciate any tips.

The problem is that this sits between the on-chain and off-chain world, which is why it's important that it doesn't adhere to a particular ecosystem and doesn't introduce too many new concepts at once.

It's not specific to Ethereum per se, e.g. Solana could come up with its own attestation service and this JSON would need to communicate how to locate the attestation on Solana's chain somehow.

I wrote it up, if you're curious. In fact, I call out the problem with leaking the Solidity type system and ABI encoding.

https://gist.github.com/lukepuplett/a0793be0b7765539173df8cabc6b2c06