BLAKE3-team / BLAKE3

the official Rust and C implementations of the BLAKE3 cryptographic hash function
Apache License 2.0
5.02k stars 341 forks source link

Hash should serialize as byte string #412

Open casey opened 2 months ago

casey commented 2 months ago

Currently, Hash serializes as a sequence of integers, which is inefficient if the underlying codec supports a more compact byte string type. For example, serializing Hash to CBOR will produce an up-to 66 byte integer sequence (the precise size of the sequence will depend on the value of the hash) instead of a fixed size 34 byte byte string.

Serializing to and deserializing from a byte string can be accomplished with serde_bytes:

pub struct Hash(#[serde(with = "serde_bytes") [u8; OUT_LEN]);
oconnor663 commented 2 months ago

Agreed. What do you think of https://github.com/BLAKE3-team/BLAKE3/commit/dd0afd640ad97b5ebcf887107162009a23ffdca0? (https://github.com/BLAKE3-team/BLAKE3/compare/serde_bytes)

casey commented 2 months ago

Looks awesome! Nice test too, 0x58 (decimal 88) indicates a CBOR byte string with a byte length, and 32 is the length. And great that serde_bytes can still deserialize the old representation.

BurningEnlightenment commented 2 months ago

Two notes from someone who wrote a CBOR serialization library in the past:

oconnor663 commented 2 months ago

https://github.com/BLAKE3-team/BLAKE3/issues/414 pointed out that this change was backwards-incompatible for non-self-describing serialization formats. I've published v1.5.3 to revert it.

Does anyone have ideas for a change along these lines that wouldn't break formats like bincode?

burdges commented 1 month ago

It's a nice example of serde's shortcomings. Afaik, crypto crates typically either avoid serde entirely, thus serde to serde users only care about one or two formats, or else define their serialization as bytes, preferably fixed length, and provide optional serde wrappers which do bytes, as fixed length if possible.

You'll need some new type here, with impl Froms both ways, but the default matters, as the other becomes niche. I'd guess the default should favor CBOR, like you first did here, but maybe ask if current behavior actually preferable for any serde target format?