Closed maurges closed 1 year ago
And it should also accept AsRef<[u8]>
It has to be a slice since we prepend length of slice before hashing it (iterator doesn't have known length, given that TrustedLen trait is nightly). It can be iterator if we choose to append list length right after hashing its content, but I'm not sure if it's exploitable to any collision attacks.
Maybe in the meantime we can add a third method, mix_from<Iter>(size: usize, iter: Iter)
? I don't know what's better, for it to sample exactly size
items, or to fail if size
doesn't correspond to real size.
@survived does this notification work?
I don't like this :( Maybe we should rather compute hash as H(H(arg1) || ... || H(arg_n))
? In this case, we don't need to hash message size at all.
Or maybe hashing length after the argument will work after all? I can't think of any way to construct a collision
does this notification work?
Yep
Actually, H(H(arg1) || ... || H(arg_n))
may be better. It's not intuitive that prepending argument length protects from collisions (e.g. I had to explain why this protects from collisions during audit), while hash of hashes is straightforward and no one will have doubts that it works as intended
In the case of mix_many, why is the length even necessary? This makes it so HashCommit::builder().mix(a).mix(b) != HashCommit::builder().mix_many(&[a, b])
, which, I don't know. We just want to encode the tuple of a and b.
For collision protection I use the simple proof. What you're hashing is essentially byte representation of your data. Separate the concept of representation as encode
. Then for data1, data2 H(encode data1) == H(encode data2) <==> encode data1 == encode data2
without regarding accidental collisions. Then we need to construct encode such that encode data1 == encode data2 <==> data1 == data2
, which just means that encode is reversible. To prove that encode is reversible one can build a decode
function.
Looking at how HashCommit works, it is reversible back into the bytes mixed, so it's ok. And it would be reversible with or without hashing the length in mix_many.
Oh wait, I realized a failure of my judgements. If you don't know the data length in advance, you do need to hash it in, and it does need to be in front, or in other well-known position, which the back of data isn't. So disregard what I said about not needing length in mix_many.
Yes what you're saying fits perfectly well into approach of hashing data as H(H(arg1) || ... || H(arg_n))
. However, in current setup we concatenate data being hashed: H(arg_1 || ... || arg_n)
and concatenation is not revertible. By prepending data length we make concatenation revertible.
I'm quite convinced to use hash of hashes approach at this point: it's easy to understand, and works with iterators. It's probably a bit less efficient, but it shouldn't be a problem
(btw you can leave any suggestion in the PR #1)
Same for mix_many_bytes. I want to be able to write
.mix_many_bytes(vec_of_bignum.iter().map(|x| x.to_bytes()))