Open jannotti opened 4 months ago
Hi, thanks for raising the issue, so for mimc there were a lot of internal discussions on our end about whether implementing the hasher interface or have a special algebraic hash interface... Implementing the hasher interface where Write() would take any stream of bytes would be possible, but we would need a way to convert this stream of bytes in chunks < Fr without ambiguity, the only solution for that I think is to interpret the stream of bytes as a big integer, decompose it in basis Fr, and take the digits as blocks.
However this method is extremely inefficient... Since the hash interface is used everywhere in gnark (specially for plonk where 3 different hashes are used with our Commit() method) we chose to let mimc implement the hash interface. We assumed that the blocks are < r, otherwise we throw an error. It seems legit as algebraic hashes are used in a ZK context anyway...
We discarded the solution of reducing the blocks of 32 bytes modulo r as it would allow collisions.
We discarded also solutions of the style "the next block we take is the biggest one not exceeding r" as it would decrease the entropy of the hash function (we could imagine statistical analysis attack where knowing that the inputs are systematically below a certain threshold would be noticeable on the output...).
All in all we went for the simplest solution which is to assume that the entries are already a stream of fr elements, and if not let the user use fr.Hash on them to have fr elements. If on your end you have ideas to make it cleaner let us know, it's a recurrent subject (that we are still discussing) and I agree it is confusing
Thanks for the detailed explanation. I agree that there are no good solutions that would let a mimc hasher's Write()
operate on arbitrary byte strings, so I'd argue that a mimc hasher is not a hash.Hash
. So I would be inclined to say you have something else on your hands -- an ElementHasher
for example. I don't know enough about where they are used to follow this point:
Since the hash interface is used everywhere in gnark (specially for plonk where 3 different hashes are used with our Commit() method) we chose to let mimc implement the hash interface.
It must be the case that users of a MiMC hasher know what they've got. They must be sending bytes that are known to be vectors of encoded field elements, or else they'd be dealing with spurious errors at random times when they send an element that is greater than the modulus. So I would think that those consumers would be happy to have a different interface, like:
type ElementHasher interface {
// Add adds more elements to the running hash. The provided byte slice must...
Add([]byte)
Modulus() // maybe? if you want callers to know
WriteString() // if you really do want to export the interface to encode arbitrary data
// the rest of hash.Hasher, but _not_ io.Writer
Sum([]byte) []byte
Reset()
Size()
BlockSize() // document that this is both the element size and output size
}
I assume any existing callers of MiMC would be just as happy with such an interface, and you'd statically prevent misguided users who think they have a normal hash.Hash
on their hands.
Anyway, at this point I understand your design choices so you can feel free to close this issue if an interface like the above would be more trouble than you think it's worth. Thanks for your time.
Another approach is to have a generic FieldHasher[T ElementType]
interface as tracked in https://github.com/Consensys/gnark-crypto/issues/448, but we haven't figured out a nice way to implement. If we want to have a generic interface, then it would require generics which lead to somewhat noisy code (see for example PLONK verifier which has type parameters all around).
But this is also an interesting idea to have a separate interfaces where hash.Hash
works as expected (on arbitrary bytes) and ElementHasher
expects the bytes to be serialization of field elements.
One more approach would be to have an init-time option which fixes the mode of operation. We have started using option arguments quite a lot on gnark side and they actually work quite well to allow modifying the primitives.
an interesting idea to have a separate interfaces where hash.Hash works as expected
If you go this way, please be sure to break existing consumers of NewMiMC
at compile time. Don't leave it as the function call to get a hash.Hash
, with a newly defined Write()
that works differently from the existing one. Perhaps you could delete it and have NewMiMCByteHasher
which returns hash.Hash
and NewMiMCElementHasher
that returns the new interface.
I just worry that code that uses Write()
in its current form not suddenly get new behaviour (even in a theoretically breaking semver change, the silent change would be dangerous.)
an interesting idea to have a separate interfaces where hash.Hash works as expected
If you go this way, please be sure to break existing consumers of
NewMiMC
at compile time. Don't leave it as the function call to get ahash.Hash
, with a newly definedWrite()
that works differently from the existing one. Perhaps you could delete it and haveNewMiMCByteHasher
which returnshash.Hash
andNewMiMCElementHasher
that returns the new interface.I just worry that code that uses
Write()
in its current form not suddenly get new behaviour (even in a theoretically breaking semver change, the silent change would be dangerous.)
You are absolutely right - it would be better to break the interface to force the downstream users to decide the expected behaviour when we're changing the defaults.
The mimc packages (of at least bn254 and bls12-381) has the following description of
digest.Write()
https://github.com/Consensys/gnark-crypto/blob/master/ecc/bn254/fr/mimc/mimc.go#L97-L105
This
digest
is returned as ahash.Hash
byNewMiMC
. However, thisWrite()
method departs from the the following description in thehash.Hash
interface:It seems very surprising to return a
hash.Hash
that cannot hash arbitrary bytes, neither in size nor in content (the caller needs to know they are really sending encoded elements that never exceed the modulus).The
WriteString()
method is provided in an apparent effort to allow hashing of arbitrary bytes, but that method is unusable, since it is defined ondigest
, which cannot be obtained from external callers.It's unclear how best to fix these issues, as I'm unsure if there are standards to be adhered to. If there are, I would expect those standards to define the MiMC hash of any byte sequence, so they would need to address how those bytes should be turned into
fr.Element
. IfWriteString
is the proper way to do so, thenWrite
should be written to accumulate bytes untilSum()
is called, and then theWriteString()
technique can be used.Description
Expected Behavior
Actual Behavior
Possible Fix
Steps to Reproduce
1. 2. 3. 4.
Context
Your Environment