Domain separators for key/value hashes in LeafOp

hdevalence commented 2 years ago

Hi, we're working on adding ICS23 support for a variant of the Jellyfish Merkle Tree from Diem, modified slightly to be a generic, byte-oriented k/v store rather than a strongly typed object store (as in the original Diem code).

One issue we're running into is expressing the leaf node hashing using a LeafOp. Our leaf nodes are hashed as

SHA256( b"JMT::LeafNode" || SHA256(b"JMT::Key" || key) || SHA256(b"JMT::Value" || value) )

i.e., both key and value bytes are prehashed, with different domain separators.

We could almost express this as a LeafOp with hash = SHA256, prefix = b"JMT::LeafNode", prehash_key, prehash_value = SHA256, length = NO_PREFIX, but that will compute

SHA256( b"JMT::LeafNode" || SHA256(key) || SHA256(value) )

without the domain separators on the key/value pairs.

It seems like there are two possible resolutions:

Remove the domain separators from the key and value hashes (this is less preferred, since domain separators are a good practice);

Change the LeafOp proto to add new prefix fields:

message LeafOp {
HashOp hash = 1;
HashOp prehash_key = 2;
HashOp prehash_value = 3;
LengthOp length = 4;
bytes prefix = 5;
// NEW: fixed bytes that may optionally be included as a domain separator for prehash_key
bytes prehash_key_prefix = 6;
// NEW: fixed bytes that may optionally be included as a domain separator for prehash_value
bytes prehash_value_prefix = 7;
}

The computation would then become

hkey = prehashKey(prehash_key_prefix || key)
hvalue = prehashValue(prehash_value_prefix || value)
output = hash(prefix || length(hkey) || hkey || length(hvalue) || hvalue)

This change would be backwards-compatible, because protos allow missing fields, and if the prehash_*_prefix fields are missing, the behavior is exactly the same as the current implementation.

Does (2) seem sensible, or is there something we're missing?

ethanfrey commented 2 years ago

It seems like there are two possible resolutions:

Remove the domain separators from the key and value hashes (this is less preferred, since domain separators are a good practice);

One option is not to remove, them, but replace them with a length prefix. Varint, like protobuf does

Also, I don't see the use in prefixing something before hashing it. It doesn't separate anything. After hashing, they are all the same 32 byte length, so we know how to separate.

How does SHA256( b"JMT::LeafNode" || SHA256(b"JMT::Key" || key) || SHA256(b"JMT::Value" || value) ) Avoid some collision or confusion that exists in SHA256( b"JMT::LeafNode" || SHA256(key) || SHA256(value) )

Does (2) seem sensible, or is there something we're missing?

That you need every ibc enabled blockchain to update to some newer version with this support before you can use this proof format.

ethanfrey commented 2 years ago

As a side note, I have realised that while every Merkle tree or trie I have seen could create proofs in some ics23 format with the same security and no significant performance hit, everyone likes to make their custom Merkle hashing algorithm. I mean Parity uses a bitmap to avoid storing all 16 child hashes in memory before hashing them. 🤯

I am close to giving up on the possibility of finding a universal format, as it feels like herding cats. I would propose we allow some Turing complete VM (like Wasm?) where people can upload their own ics23 verifiers.

hdevalence commented 2 years ago

One option is not to remove, them, but replace them with a length prefix. Varint, like protobuf does

Also, I don't see the use in prefixing something before hashing it. It doesn't separate anything. After hashing, they are all the same 32 byte length, so we know how to separate.

How does SHA256( b"JMT::LeafNode" || SHA256(b"JMT::Key" || key) || SHA256(b"JMT::Value" || value) ) Avoid some collision or confusion that exists in SHA256( b"JMT::LeafNode" || SHA256(key) || SHA256(value) )

A length prefix doesn't help here, because the point is to do domain separation, to ensure that keys are hashed with a different hash function than values, so that it is impossible to confuse a key hash with a value hash (or any other hash of any other data from any other protocol).

Does (2) seem sensible, or is there something we're missing?

That you need every ibc enabled blockchain to update to some newer version with this support before you can use this proof format.

Yes, that's correct, but it's a drop-in upgrade, because it's backwards-compatible with all existing proof specifications.

I have realised that while every Merkle tree or trie I have seen could create proofs in some ics23 format with the same security and no significant performance hit, everyone likes to make their custom Merkle hashing algorithm.

To be clear, we have a working implementation of ICS23 proofs for our Merkle tree with these changes. We just need to be able to express domain separators for keys and values.

ethanfrey commented 2 years ago

Yes, that's correct, but it's a drop-in upgrade, because it's backwards-compatible with all existing proof specifications.

Yes, but it will take time to roll out. I just cut 0.7 with support for SMT, which should roll out with ibc-go v3

You can check with the ibc-go team as to estimates when they would roll out newer versions.

cosmos / ics23

Domain separators for key/value hashes in LeafOp #76