Field-agnostic Fiat-Shamir Challenge Names

Tabaie commented 1 year ago

Currently, in order to align hash inputs correctly, the user of a Fiat-Shamir transcript has to pad the challenge names so that they are as long as a hashing block. It is desirable for that to be done automatically,
It is better to pre-pad rather than post-pad with zeros to avoid the resulting block being "larger" than the field modulus. This will make gnark/std/fiat-shamir.TestFiatShamir pass (it currently fails.)

304 addresses these issues, but introduces another: gnark/std/commitments/fri.TestFriVerification fails because a challenge name is too large. To solve this we must somehow pass a decompose function to the transcript object, such that for arithmetic hashes it operates as introduced in https://github.com/ConsenSys/gnark-crypto/pull/265 and for traditional hashes it simply breaks them up into blocks. How to do this cleanly such that Transcript remains hash-agnostic and NewTranscript doesn't get more complicated? I suggest instead of using the standard go Hash interface we use a wrapper that also has a Decompose function.

There are other failing gnark tests (with gnark-crypto@develop) that this issue doesn't address:

gnark/std/accumulator/merkle.TestVerify
gnark/std/commitments/fri.TestFriVerification
gnark/std/signature/eddsa.TestEddsa

Tabaie commented 1 year ago

@gbotrel @ThomasPiellard Please lmk if the existing code and the proposal at the end make sense.

gbotrel commented 1 year ago

So, to sum up state of things after your PR, we would have an arithmetic hash like MiMC that is supposed to take as input field elements, that would implement the following 2 interfaces;

type ArithmeticHash interface {
    WriteString(rawBytes []byte) error
 }

and from go std:

 type Hash interface {
    // Write (via the embedded io.Writer interface) adds more data to the running hash.
    // It never returns an error.
    [io](https://pkg.go.dev/io).[Writer](https://pkg.go.dev/io#Writer)
    // ...
}

I think #265 introduced a bad decision; make Write return an error if the []byte input is not mod reduced.

Since it's cumbersome to add fr.Element in signatures here, what if we only keep Write([]byte), have a fast path (if input is mod reduced, no decomposition involved).

The piece we need to document well is; as a user, if I do:

h.Write(0xAA)
h.Write(0xBB)
h.Write(0xCC)

Caveat then, with what I proposed this is interpreted as 3 field elements (assuming q > CC). So if the user wanted to hash a single 0XAABBCC value, it's his responsability to bufferize on the calling side...

gbotrel commented 1 year ago

another option is, for packages like fiatshamir that currently take as input a hash.Hash parameter (to handle mimc and sha256 for example), we replace that by a gnark.Hash interface, and we write a wrapper for the hash function we use internally (like a struct containing a sha256 hasher that implements a more descriptive API that clearly differentiate writing an element vs appending data to a buffer)

gbotrel commented 1 year ago

cc @Tabaie @ThomasPiellard

Tabaie commented 1 year ago

I still think having two different Write functions, one for field elements and one for general strings is the way to go. So WriteField would attempt to turn read its input as ONE field element, and error if that weren't possible, much like what we have now except that it happens at writing time, not "summing" time. The ordinary Write function would perform a SHA256-based hash-to-field and thereby also turn its input into a single field element. Sum would just perform MiMC on those field elements obtained by the two write functions. Part of our problem right now imo is caused by the fact that Write simply appends its input to a byte buffer and thus much information about the input is lost. Not only is this inconvenient, but it may even enable an attacker to find collisions, since h.Write("hi ").Write("gnark").Sum() and h.Write("hi g").Write("nark").Sum() yield the same result, I think.

There are two difficulties with this:

We would have to wrap ordinary hashes, such that Write and WriteField would both simply alias the Write function underneath. If we want the hash to yield field elements or curve points, it would require some refactoring of hash-to-field and hash-to-curve that would allow for a Write/Sum interface.
I think parts of our codebase assume that the hash of multiple objects is the same as hashing a single string consisting of their concatenation, as mentioned above. It would take some (probably minor) effort to write that assumption out.

Sorry if I'm repeating things @gbotrel or myself have already said. This is an attempt to sum (no pun intended) things up.

Consensys / gnark-crypto

Field-agnostic Fiat-Shamir Challenge Names #304