ava-labs / hypersdk

Opinionated Framework for Building Hyper-Scalable Blockchains on Avalanche
Other
200 stars 103 forks source link

Canonical Protobuf Codec #1604

Open aaronbuchwald opened 1 day ago

aaronbuchwald commented 1 day ago

Implement a protobuf compatible codec that writes canonical output (for hash compatibility), but is read compatible with existing protobuf implementations.

containerman17 commented 1 day ago

Initial situation

Initially, HyperSDK used custom marshalling everywhere, and VM developers were required to implement it the same custom way.

Standardizing with ABI

To support universal encoding for all actions, we implemented an abi package that describes struct fields using JSON. Now, developers no longer need to write their own custom encoders/decoders for each action. More importantly, we added support for third-party browser and mobile wallets, starting with Metamask Snap.

The ABI itself is just a field description, but the actual encoding/decoding is done by avalanchego's codec/linearcodec.

We didn’t go with Google’s protobuf because it's non-deterministic—you can encode the same data into different bytes, especially with maps. Another issue is defining data structures in protobuf and generating Golang structs, which isn’t great for developer experience.

The Holy Grail of Marshalers

During the ABI + linearcodec implementation, it became clear that our approach is... less than perfect. Every language needs its own tooling built and supported from scratch. Here are the requirements for the perfect codec, mostly outlined by @StephenButtolph:

Protobuf

At first glance, protobuf seems like the perfect base as it's the most widely supported codec. However, for encoding in TypeScript, Golang, and other languages, we'd need to rewrite our implementations to ensure canonical encoding, so that the same data in a struct/object/map always gets encoded the same way. At least the reading part (for explorers, I assume) remains unchanged. I’m starting to question the "existing tooling" argument here. Also, I've heard that protobuf is slow, but I haven't benchmarked it myself.

Borsch

Another great option is the Borsch codec. The main advantage is that it’s simpler and faster. The downside is it doesn’t support dynamic integer sizes and isn't very popular, so there's limited tooling available beyond Golang, Rust, and almost full support for TypeScript.

Keep improving Avalanchego’s Codec

This is also a valid option. The pro is that we wouldn't need to rely on a custom file format, and fewer changes would be required. The main con is the lack of existing tooling support.

Note: It only makes sense to implement this after the 100k TPS benchmark is up to date with the main branch since we’ll need to make major changes and want to ensure we’re still on track to meet the 100k TPS goal.