bug: MiMC Write() violates hash.Hash expectations.

Consensys / gnark-crypto

gnark-crypto provides elliptic curve and pairing-based cryptography on BN, BLS12, BLS24 and BW6 curves. It also provides various algorithms (algebra, crypto) of particular interest to zero knowledge proof systems.

Apache License 2.0

495 stars 160 forks source link

bug: MiMC Write() violates hash.Hash expectations. #504

Open jannotti opened 4 months ago

jannotti commented 4 months ago

The mimc packages (of at least bn254 and bls12-381) has the following description of digest.Write()

https://github.com/Consensys/gnark-crypto/blob/master/ecc/bn254/fr/mimc/mimc.go#L97-L105

This digest is returned as a hash.Hash by NewMiMC. However, this Write() method departs from the the following description in the hash.Hash interface:

    // Write (via the embedded io.Writer interface) adds more data to the running hash.
    // It never returns an error.

It seems very surprising to return a hash.Hash that cannot hash arbitrary bytes, neither in size nor in content (the caller needs to know they are really sending encoded elements that never exceed the modulus).

The WriteString() method is provided in an apparent effort to allow hashing of arbitrary bytes, but that method is unusable, since it is defined on digest, which cannot be obtained from external callers.

It's unclear how best to fix these issues, as I'm unsure if there are standards to be adhered to. If there are, I would expect those standards to define the MiMC hash of any byte sequence, so they would need to address how those bytes should be turned into fr.Element. If WriteString is the proper way to do so, then Write should be written to accumulate bytes until Sum() is called, and then the WriteString() technique can be used.

Description

Expected Behavior

Actual Behavior

Possible Fix

Steps to Reproduce

1. 2. 3. 4.

Context

Your Environment

gnark-crypto version used:
go version (e.g. 1.20.6):
Operating System and version:

ThomasPiellard commented 4 months ago

Hi, thanks for raising the issue, so for mimc there were a lot of internal discussions on our end about whether implementing the hasher interface or have a special algebraic hash interface... Implementing the hasher interface where Write() would take any stream of bytes would be possible, but we would need a way to convert this stream of bytes in chunks < Fr without ambiguity, the only solution for that I think is to interpret the stream of bytes as a big integer, decompose it in basis Fr, and take the digits as blocks.

However this method is extremely inefficient... Since the hash interface is used everywhere in gnark (specially for plonk where 3 different hashes are used with our Commit() method) we chose to let mimc implement the hash interface. We assumed that the blocks are < r, otherwise we throw an error. It seems legit as algebraic hashes are used in a ZK context anyway...

We discarded the solution of reducing the blocks of 32 bytes modulo r as it would allow collisions.

We discarded also solutions of the style "the next block we take is the biggest one not exceeding r" as it would decrease the entropy of the hash function (we could imagine statistical analysis attack where knowing that the inputs are systematically below a certain threshold would be noticeable on the output...).

All in all we went for the simplest solution which is to assume that the entries are already a stream of fr elements, and if not let the user use fr.Hash on them to have fr elements. If on your end you have ideas to make it cleaner let us know, it's a recurrent subject (that we are still discussing) and I agree it is confusing

jannotti commented 4 months ago

Thanks for the detailed explanation. I agree that there are no good solutions that would let a mimc hasher's Write() operate on arbitrary byte strings, so I'd argue that a mimc hasher is not a hash.Hash. So I would be inclined to say you have something else on your hands -- an ElementHasher for example. I don't know enough about where they are used to follow this point:

Since the hash interface is used everywhere in gnark (specially for plonk where 3 different hashes are used with our Commit() method) we chose to let mimc implement the hash interface.

It must be the case that users of a MiMC hasher know what they've got. They must be sending bytes that are known to be vectors of encoded field elements, or else they'd be dealing with spurious errors at random times when they send an element that is greater than the modulus. So I would think that those consumers would be happy to have a different interface, like:

type ElementHasher interface {
   // Add adds more elements to the running hash. The provided byte slice must...
   Add([]byte)
   Modulus() // maybe? if you want callers to know 
   WriteString() // if you really do want to export the interface to encode arbitrary data

   // the rest of hash.Hasher, but _not_ io.Writer
   Sum([]byte) []byte
   Reset()
   Size()
   BlockSize() // document that this is both the element size and output size
}

I assume any existing callers of MiMC would be just as happy with such an interface, and you'd statically prevent misguided users who think they have a normal hash.Hash on their hands.

Anyway, at this point I understand your design choices so you can feel free to close this issue if an interface like the above would be more trouble than you think it's worth. Thanks for your time.

ivokub commented 4 months ago

Another approach is to have a generic FieldHasher[T ElementType] interface as tracked in https://github.com/Consensys/gnark-crypto/issues/448, but we haven't figured out a nice way to implement. If we want to have a generic interface, then it would require generics which lead to somewhat noisy code (see for example PLONK verifier which has type parameters all around).

But this is also an interesting idea to have a separate interfaces where hash.Hash works as expected (on arbitrary bytes) and ElementHasher expects the bytes to be serialization of field elements. One more approach would be to have an init-time option which fixes the mode of operation. We have started using option arguments quite a lot on gnark side and they actually work quite well to allow modifying the primitives.

jannotti commented 4 months ago

an interesting idea to have a separate interfaces where hash.Hash works as expected

If you go this way, please be sure to break existing consumers of NewMiMC at compile time. Don't leave it as the function call to get a hash.Hash, with a newly defined Write() that works differently from the existing one. Perhaps you could delete it and have NewMiMCByteHasher which returns hash.Hash and NewMiMCElementHasher that returns the new interface.

I just worry that code that uses Write() in its current form not suddenly get new behaviour (even in a theoretically breaking semver change, the silent change would be dangerous.)

ivokub commented 4 months ago

an interesting idea to have a separate interfaces where hash.Hash works as expected

If you go this way, please be sure to break existing consumers of NewMiMC at compile time. Don't leave it as the function call to get a hash.Hash, with a newly defined Write() that works differently from the existing one. Perhaps you could delete it and have NewMiMCByteHasher which returns hash.Hash and NewMiMCElementHasher that returns the new interface.

I just worry that code that uses Write() in its current form not suddenly get new behaviour (even in a theoretically breaking semver change, the silent change would be dangerous.)

You are absolutely right - it would be better to break the interface to force the downstream users to decide the expected behaviour when we're changing the defaults.