Increase number of rounds for MiMC (or LongsightL)

HarryR commented 5 years ago

So, I should've originally listened to Daira, as she's a beacon of wisdom and insight, and further research has shown that MiMC-p/p with 12 rounds is very insecure.

This ticket aims to address the shortcomings of the current implementation and making it 'secure' with some small tweaks.

Define LongsightMP-R/p/e as Miyaguchi–Preneel applied to MiMC-R/p/e. We need R = log7(p) ~= 91 rounds for the non-Feistel variant of MiMC-R/p/7. Raising to the power 7 requires 4 constraints. So, for the BLS12-381 p,

LongsightMP-91/p/7 requires 728 constraints.

LongsightL-91/p/7 requires 1092 constraints. (For reference, the insecure LongsightF-322/p/3 required 644 constraints.)

Exponent of 5 is a bijection over F_p for the altbn scalar field used by Ethereum, need to verify if exponent of 7 is a bijection also.

With exponent=7, each round requires 4 constraints for exponentiation:

def round_e7(x, p):
    a = (x*x) % p # x^2
    b = (a*a) % p # x^4
    c = (a*b) % p # x^6
    return (c*x) % p# x^7

Where 91 rounds requires 364 constraints when using a round-constants.

R = 91 == ceil(log(p) / log(7))
R = 161 == ceil(log(p) / log(3))

The round constants for the keys can be generated as a hash stream using keccak256.

def round_constants(seed, p, R):
    for _ in range(R):
        seed = H(seed)
        yield seed

Any personalisation must change the initial seed, to make it derive a unique sequence of round constants. The constants are used to personalise the use of the key.

Note that for the first round the constant is zero, and for the last round the key is appended to the result, this matches the following implementations and is required to ensure the result is bijective:

Pseudocode:

def mimc(k, x, seed, p, e, R=91):
    for c_i in [0] + round_constants(seed, p, R - 2):
        a = (x + k + c_i) % p
        x = (a ** e) % p
   return (x + k) % p

The Miyaguchi–Preneel one-way construct can be implemented as a linear combination between rounds, returning a linear combination with the previous result, requiring no additional constraints.

def mimc_pe7_mp(k, x, seed, p, e, R=91):
    for x_i in x:
        k = (k + x_i + mimc(k, x_i, seed, p, e, R)) % p
        yield k

[ ] Rename to MiMC_MP? when used with Miyaguchi–Preneel compression function
[x] Verify exponent of 7 is a bijection over F_p (for the altbn scalar field)
[ ] Use exponent of 7 for Longsight MP, with 91 or 46 rounds
[ ] Make computation of round constants automatic rather than pre-computed (applies to both C++ and Solidity versions)
[x] Gas benchmark for Solidity implementation.

HarryR commented 5 years ago

I have tried to improve the performance of MiMC-p/p on EVM, and have provided reference implementations in Python and Solidity which I will then verify against the other implementations and the paper.

https://gist.github.com/HarryR/80b5ff2ce13da12edafda6d21c780730

In this implementation the H function is keccak256 and the key is used to derive the round constants sequence rather than using a separate seed specifically for the constants.

I recommend using exponent 7 with 46 rounds, for an acceptable security/performance trade-off.

Gas usage:

exponent	rounds	gas	avg per round	poly degree	log2(d)
7	100	20580	206	7**99	278
7	91	18735	206	7**90	252
7	46	9510	207	7**45	126
7	4	900	225	7**3	8
7	3	695	232	7**2	5
5	110	21126	192	5**109	253
5	100	19216	193	5**99	230
5	55	10621	193	5**54	125
5	4	880	220	5**3	6
5	3	689	229	5**2	4

Exponent of 7 is slightly more expensive, but requires fewer rounds.

jbaylina commented 5 years ago

For reference:

Here is a 100% EVM assembly version of MiMC Smart contract with exponent 7 that takes less than 100gas/round: https://github.com/iden3/mimc

HarryR commented 5 years ago

@jbaylina suggested that the hash for the base points can be computed separately in order to allow parallel computation, this is relevant to Pedersen Hashes too as we want to re-use as much code as possible for 'constant generation'. He had reservations about using a hash-chain to generate a sequence of constants, in that it wouldn't allow for parallel computation.

In the example code above, round constants are generated using a hash-chain:

def round_constants(seed, p, R):
    for _ in range(R):
        seed = H(seed)
        yield seed

This creates a sequence where 0=H(seed), 1=H(H(seed)) etc.

@jbaylina was suggesting a scheme (and implemented it, for Pedersen hashes) where the seeds for the Y coordinates are generated by making a string of the following components then hashing them (using blake2):

Name / Personalisation string
Sequence offset
Try offset (e.g. when Y coordinate doesn't result in a square X)

See: https://github.com/iden3/circomlib/blob/98a33d5700bee1fe026badd2baf67f7a58429c0e/src/pedersenHash.js#L62

    while (p==null) {
        const S = GENPOINT_PREFIX + "_" + padLeftZeros(pointIdx, 32) + "_" + padLeftZeros(tryIdx, 32);
        const h = createBlakeHash("blake256").update(S).digest();
        h[31] = h[31] & 0xBF;  // Set 255th bit to 0 (256th is the signal and 254th is the last possible bit to 1)
        p = babyJub.unpackPoint(h);
        tryIdx++;
    }

In this implementation a new string is created for every try for every sequence and try.

Whereas ethsnarks uses https://github.com/HarryR/ethsnarks/blob/master/ethsnarks/jubjub.py#L180:

# in base point generation:
data = b"%-28s%04X" % (name, i)
return Point.from_hash(data)

def from_hash(cls, entropy):
    entropy = sha256(entropy).digest()
    entropy_as_int = int.from_bytes(entropy, 'big')
    y = FQ(entropy_as_int)
    while True:
        try:
            p = cls.from_y(y)
        except SquareRootError:
            y += 1
            continue
                        # ...
        return p

This creates a hash for each base point sequence, then increments its big-endian interpretation until an X coordinate can be recovered from the Y. I think my implementation of from_hash is more likely to recover an X coordinate with less average tries than if a new hash is generated at each try - but only marginally. The two implementations are similar, apart from the 'increment Y until matching X is found' part - which for Ethsnarks is designed to be Ethereum compatible (even though modulo square root probably isn't affordable on EVM).

The part which is necessary for Ethereum, IMO, is re-using the same components for MiMC round-constant generation and Pedersen hash base point generation. It's very cheap to implement the round_constants mechanism described above, generating a hash chain using keccak256 from an initial seed is faster and cheaper than any alternative.

Regarding parallelism, I think it's fair to create a sequence of hashes which are used as starting points, while this isn't parallel on its own - it is fast to generate the sequence which can then be made parallel for every individual implementation.

IMO the way forward is to use the same round-constant generation mechanism for MiMC and Pedersen hash, for pedersen hash each constant in the sequence is the starting point for recovering the X from the Y, and for MiMC it's used verbatim.

As far as 'nothing up my sleeve' numbers go, for generating the base points the mechanism implemented by ethsnarks (where the Y coordinate is incremented until a resulting point is found) matches the literature and existing implementations, and the hash sequence relies more heavily on the random-oracle-model of non-determinism.

To use the same sequence generator across EVM and Pedersen hash, the constraints are:

initial seed is an arbitrary string
use keccak256 hash to turn the initial seed into the first constant of the sequence
interpret hash outputs as big-endian 256bit integers
use the previous sequence output as the input to generate the next round constant
use modulo p, rather than bit masking or truncation, to move the hash output onto the scalar field

For pedersen hash base point generation:

start with constant generated from sequence as Y
try to recover X coordinate
if X is not square, increment Y
repeat until X is square

HarryR commented 5 years ago

TODO:

[x] Implement round-constant sequence generation for keccak256 in C++
[ ] Use new round-constant sequence for pedersen hash base point generation
[ ] Re-generate test cases using new base points
[x] Use new round-constant sequence for MiMC round constants
[x] Use improved MiMC Solidity implementation
[x] Re-generate test cases using new MiMC round constants
[x] Verify it works etc.

HarryR / ethsnarks

Increase number of rounds for MiMC (or LongsightL) #87