LFDT-Lockness / generic-ec

Generic elliptic curve cryptography in Rust
Apache License 2.0
2 stars 2 forks source link

generic_ec_zkp::hash_commitment::Builder::mix_many should accept an iterator, not slice #2

Closed maurges closed 1 year ago

maurges commented 1 year ago

Same for mix_many_bytes. I want to be able to write .mix_many_bytes(vec_of_bignum.iter().map(|x| x.to_bytes()))

maurges commented 1 year ago

And it should also accept AsRef<[u8]>

survived commented 1 year ago

It has to be a slice since we prepend length of slice before hashing it (iterator doesn't have known length, given that TrustedLen trait is nightly). It can be iterator if we choose to append list length right after hashing its content, but I'm not sure if it's exploitable to any collision attacks.

maurges commented 1 year ago

Maybe in the meantime we can add a third method, mix_from<Iter>(size: usize, iter: Iter)? I don't know what's better, for it to sample exactly size items, or to fail if size doesn't correspond to real size.

maurges commented 1 year ago

@survived does this notification work?

survived commented 1 year ago

I don't like this :( Maybe we should rather compute hash as H(H(arg1) || ... || H(arg_n))? In this case, we don't need to hash message size at all.

survived commented 1 year ago

Or maybe hashing length after the argument will work after all? I can't think of any way to construct a collision

survived commented 1 year ago

does this notification work?

Yep

survived commented 1 year ago

Actually, H(H(arg1) || ... || H(arg_n)) may be better. It's not intuitive that prepending argument length protects from collisions (e.g. I had to explain why this protects from collisions during audit), while hash of hashes is straightforward and no one will have doubts that it works as intended

maurges commented 1 year ago

In the case of mix_many, why is the length even necessary? This makes it so HashCommit::builder().mix(a).mix(b) != HashCommit::builder().mix_many(&[a, b]), which, I don't know. We just want to encode the tuple of a and b.

maurges commented 1 year ago

For collision protection I use the simple proof. What you're hashing is essentially byte representation of your data. Separate the concept of representation as encode. Then for data1, data2 H(encode data1) == H(encode data2) <==> encode data1 == encode data2 without regarding accidental collisions. Then we need to construct encode such that encode data1 == encode data2 <==> data1 == data2, which just means that encode is reversible. To prove that encode is reversible one can build a decode function.

Looking at how HashCommit works, it is reversible back into the bytes mixed, so it's ok. And it would be reversible with or without hashing the length in mix_many.

maurges commented 1 year ago

Oh wait, I realized a failure of my judgements. If you don't know the data length in advance, you do need to hash it in, and it does need to be in front, or in other well-known position, which the back of data isn't. So disregard what I said about not needing length in mix_many.

survived commented 1 year ago

Yes what you're saying fits perfectly well into approach of hashing data as H(H(arg1) || ... || H(arg_n)). However, in current setup we concatenate data being hashed: H(arg_1 || ... || arg_n) and concatenation is not revertible. By prepending data length we make concatenation revertible.

survived commented 1 year ago

I'm quite convinced to use hash of hashes approach at this point: it's easy to understand, and works with iterators. It's probably a bit less efficient, but it shouldn't be a problem

survived commented 1 year ago

Fixed in https://github.com/dfns-labs/generic-ec/pull/1/commits/0eaee76851b51464818586abcd241242ac562742

survived commented 1 year ago

(btw you can leave any suggestion in the PR #1)