HarryR / ethsnarks

A toolkit for viable zk-SNARKS on Ethereum, Web, Mobile and Desktop
GNU Lesser General Public License v3.0
240 stars 57 forks source link

Increase number of rounds for MiMC (or LongsightL) #87

Closed HarryR closed 5 years ago

HarryR commented 5 years ago

So, I should've originally listened to Daira, as she's a beacon of wisdom and insight, and further research has shown that MiMC-p/p with 12 rounds is very insecure.

This ticket aims to address the shortcomings of the current implementation and making it 'secure' with some small tweaks.

Define LongsightMP-R/p/e as Miyaguchi–Preneel applied to MiMC-R/p/e. We need R = log7(p) ~= 91 rounds for the non-Feistel variant of MiMC-R/p/7. Raising to the power 7 requires 4 constraints. So, for the BLS12-381 p,

  • LongsightMP-91/p/7 requires 728 constraints.
  • LongsightL-91/p/7 requires 1092 constraints. (For reference, the insecure LongsightF-322/p/3 required 644 constraints.)

Exponent of 5 is a bijection over F_p for the altbn scalar field used by Ethereum, need to verify if exponent of 7 is a bijection also.

With exponent=7, each round requires 4 constraints for exponentiation:

def round_e7(x, p):
    a = (x*x) % p # x^2
    b = (a*a) % p # x^4
    c = (a*b) % p # x^6
    return (c*x) % p# x^7

Where 91 rounds requires 364 constraints when using a round-constants.

R = 91 == ceil(log(p) / log(7))
R = 161 == ceil(log(p) / log(3))

The round constants for the keys can be generated as a hash stream using keccak256.

def round_constants(seed, p, R):
    for _ in range(R):
        seed = H(seed)
        yield seed

Any personalisation must change the initial seed, to make it derive a unique sequence of round constants. The constants are used to personalise the use of the key.

Note that for the first round the constant is zero, and for the last round the key is appended to the result, this matches the following implementations and is required to ensure the result is bijective:

Pseudocode:

def mimc(k, x, seed, p, e, R=91):
    for c_i in [0] + round_constants(seed, p, R - 2):
        a = (x + k + c_i) % p
        x = (a ** e) % p
   return (x + k) % p

The Miyaguchi–Preneel one-way construct can be implemented as a linear combination between rounds, returning a linear combination with the previous result, requiring no additional constraints.

def mimc_pe7_mp(k, x, seed, p, e, R=91):
    for x_i in x:
        k = (k + x_i + mimc(k, x_i, seed, p, e, R)) % p
        yield k
HarryR commented 5 years ago

I have tried to improve the performance of MiMC-p/p on EVM, and have provided reference implementations in Python and Solidity which I will then verify against the other implementations and the paper.

https://gist.github.com/HarryR/80b5ff2ce13da12edafda6d21c780730

In this implementation the H function is keccak256 and the key is used to derive the round constants sequence rather than using a separate seed specifically for the constants.

I recommend using exponent 7 with 46 rounds, for an acceptable security/performance trade-off.

Gas usage:

exponent rounds gas avg per round poly degree log2(d)
7 100 20580 206 7**99 278
7 91 18735 206 7**90 252
7 46 9510 207 7**45 126
7 4 900 225 7**3 8
7 3 695 232 7**2 5
5 110 21126 192 5**109 253
5 100 19216 193 5**99 230
5 55 10621 193 5**54 125
5 4 880 220 5**3 6
5 3 689 229 5**2 4

Exponent of 7 is slightly more expensive, but requires fewer rounds.

jbaylina commented 5 years ago

For reference:

Here is a 100% EVM assembly version of MiMC Smart contract with exponent 7 that takes less than 100gas/round: https://github.com/iden3/mimc

HarryR commented 5 years ago

@jbaylina suggested that the hash for the base points can be computed separately in order to allow parallel computation, this is relevant to Pedersen Hashes too as we want to re-use as much code as possible for 'constant generation'. He had reservations about using a hash-chain to generate a sequence of constants, in that it wouldn't allow for parallel computation.

In the example code above, round constants are generated using a hash-chain:

def round_constants(seed, p, R):
    for _ in range(R):
        seed = H(seed)
        yield seed

This creates a sequence where 0=H(seed), 1=H(H(seed)) etc.

@jbaylina was suggesting a scheme (and implemented it, for Pedersen hashes) where the seeds for the Y coordinates are generated by making a string of the following components then hashing them (using blake2):

See: https://github.com/iden3/circomlib/blob/98a33d5700bee1fe026badd2baf67f7a58429c0e/src/pedersenHash.js#L62

    while (p==null) {
        const S = GENPOINT_PREFIX + "_" + padLeftZeros(pointIdx, 32) + "_" + padLeftZeros(tryIdx, 32);
        const h = createBlakeHash("blake256").update(S).digest();
        h[31] = h[31] & 0xBF;  // Set 255th bit to 0 (256th is the signal and 254th is the last possible bit to 1)
        p = babyJub.unpackPoint(h);
        tryIdx++;
    }

In this implementation a new string is created for every try for every sequence and try.

Whereas ethsnarks uses https://github.com/HarryR/ethsnarks/blob/master/ethsnarks/jubjub.py#L180:

# in base point generation:
data = b"%-28s%04X" % (name, i)
return Point.from_hash(data)

def from_hash(cls, entropy):
    entropy = sha256(entropy).digest()
    entropy_as_int = int.from_bytes(entropy, 'big')
    y = FQ(entropy_as_int)
    while True:
        try:
            p = cls.from_y(y)
        except SquareRootError:
            y += 1
            continue
                        # ...
        return p

This creates a hash for each base point sequence, then increments its big-endian interpretation until an X coordinate can be recovered from the Y. I think my implementation of from_hash is more likely to recover an X coordinate with less average tries than if a new hash is generated at each try - but only marginally. The two implementations are similar, apart from the 'increment Y until matching X is found' part - which for Ethsnarks is designed to be Ethereum compatible (even though modulo square root probably isn't affordable on EVM).

The part which is necessary for Ethereum, IMO, is re-using the same components for MiMC round-constant generation and Pedersen hash base point generation. It's very cheap to implement the round_constants mechanism described above, generating a hash chain using keccak256 from an initial seed is faster and cheaper than any alternative.

Regarding parallelism, I think it's fair to create a sequence of hashes which are used as starting points, while this isn't parallel on its own - it is fast to generate the sequence which can then be made parallel for every individual implementation.

IMO the way forward is to use the same round-constant generation mechanism for MiMC and Pedersen hash, for pedersen hash each constant in the sequence is the starting point for recovering the X from the Y, and for MiMC it's used verbatim.

As far as 'nothing up my sleeve' numbers go, for generating the base points the mechanism implemented by ethsnarks (where the Y coordinate is incremented until a resulting point is found) matches the literature and existing implementations, and the hash sequence relies more heavily on the random-oracle-model of non-determinism.

To use the same sequence generator across EVM and Pedersen hash, the constraints are:

For pedersen hash base point generation:

HarryR commented 5 years ago

TODO: