farcasterxyz / hub-monorepo

Implementation of the Farcaster Hub specification and supporting libraries for building applications on Farcaster
https://www.thehubble.xyz
MIT License
700 stars 392 forks source link

bug(hubble): hashes from ts-proto don't match those from other libraries causing merge failures #757

Closed varunsrin closed 11 months ago

varunsrin commented 1 year ago

(thanks to @CassOnMars for catching this issue)

It turns out that ts-proto serializes protobufs in a slightly different way than other libraries.

If a protobuf has an empty integer array, ts-proto handles serialization differently from other libraries like proto-es. This isn’t normally an issue since protobufs and grpc don’t require deterministic byte outputs to work correctly.

Farcaster has a problem here because it uses the serialized bytes to calculate a hash and expects them to be deterministic. Unfortunately, CastAdd messages have integer arrays that can be empty (mentions, mentionPositions ) and messages generated by ts-proto will have a different hash than messages generated by other libraries. Hubble uses ts-proto to check the hashes and so any message generated by other libs are rejected.

Users of other libraries could patch their serialization to work in the same way as ts-proto's serialization. But this isn't always trivial depending on the language and the library.

The long term fix is to havea canonical serialization process independent of protobufs that produces deterministic bytes for Farcaster objects. Messages are sent over the wire using the non-deterministic protobuf formats, are deserialized into objects, then serialized again using canonical serialization before being hashes. Libraries like hub-nodejs can handle this under the hood, hiding this complexity from most devs.

What’s less clear is what we should do today to solve this problem. There are two options:

Option 1: The Safe Path

Patch Hubble’s ts-proto to use the “correct” serialization step for integer arrays and re-sign all messages. This amounts to another hash migration since we have to update the hash of every cast that has no mentions, along with every cast or reaction that points to it. Later, we can migrate to canonicalized hashing.

Pros:

Cons:

Option 2: The Fast Path

Launch with ts-proto and make its hash format the “correct form”. No other libraries will be able to generate valid messages without extra code written to change the serialization. Later, we can migrate to canonicalized hashing.

Pros:

Cons:

varunsrin commented 1 year ago

After much consideration and discussion with developers we're leaning towards Option 2. The rough plan going forward is:

Developers can freely read and write messages but hashing and signing messages is only straightforward in Node.js. In any other language, you must write custom serialization logic to convert the message to bytes before hashing and signing it, which matches the ts-proto logic.

Farcaster will not add any new message types or fields at this point, since ts-proto may have other unexpected behaviors. The team will develop a new canonical serialization and hashing scheme once hubs are stable. It's hard to know how long this will be exactly, but a reasonable guess is 1 to 3 months.

Once hubs are stabilized:

sanjayprabhu commented 1 year ago

FIP on plan to address this https://github.com/farcasterxyz/protocol/discussions/87

sanjayprabhu commented 11 months ago

@adityapk00 has an elegant solution in https://github.com/farcasterxyz/hub-monorepo/pull/1508