Closed HarryR closed 5 years ago
I have tried to improve the performance of MiMC-p/p on EVM, and have provided reference implementations in Python and Solidity which I will then verify against the other implementations and the paper.
https://gist.github.com/HarryR/80b5ff2ce13da12edafda6d21c780730
In this implementation the H
function is keccak256
and the key is used to derive the round constants sequence rather than using a separate seed specifically for the constants.
I recommend using exponent 7 with 46 rounds, for an acceptable security/performance trade-off.
Gas usage:
exponent | rounds | gas | avg per round | poly degree | log2(d) |
---|---|---|---|---|---|
7 | 100 | 20580 | 206 | 7**99 | 278 |
7 | 91 | 18735 | 206 | 7**90 | 252 |
7 | 46 | 9510 | 207 | 7**45 | 126 |
7 | 4 | 900 | 225 | 7**3 | 8 |
7 | 3 | 695 | 232 | 7**2 | 5 |
5 | 110 | 21126 | 192 | 5**109 | 253 |
5 | 100 | 19216 | 193 | 5**99 | 230 |
5 | 55 | 10621 | 193 | 5**54 | 125 |
5 | 4 | 880 | 220 | 5**3 | 6 |
5 | 3 | 689 | 229 | 5**2 | 4 |
Exponent of 7 is slightly more expensive, but requires fewer rounds.
For reference:
Here is a 100% EVM assembly version of MiMC Smart contract with exponent 7 that takes less than 100gas/round: https://github.com/iden3/mimc
@jbaylina suggested that the hash for the base points can be computed separately in order to allow parallel computation, this is relevant to Pedersen Hashes too as we want to re-use as much code as possible for 'constant generation'. He had reservations about using a hash-chain to generate a sequence of constants, in that it wouldn't allow for parallel computation.
In the example code above, round constants are generated using a hash-chain:
def round_constants(seed, p, R):
for _ in range(R):
seed = H(seed)
yield seed
This creates a sequence where 0=H(seed)
, 1=H(H(seed))
etc.
@jbaylina was suggesting a scheme (and implemented it, for Pedersen hashes) where the seeds for the Y coordinates are generated by making a string of the following components then hashing them (using blake2):
Y
coordinate doesn't result in a square X
) while (p==null) {
const S = GENPOINT_PREFIX + "_" + padLeftZeros(pointIdx, 32) + "_" + padLeftZeros(tryIdx, 32);
const h = createBlakeHash("blake256").update(S).digest();
h[31] = h[31] & 0xBF; // Set 255th bit to 0 (256th is the signal and 254th is the last possible bit to 1)
p = babyJub.unpackPoint(h);
tryIdx++;
}
In this implementation a new string is created for every try for every sequence and try.
Whereas ethsnarks uses https://github.com/HarryR/ethsnarks/blob/master/ethsnarks/jubjub.py#L180:
# in base point generation:
data = b"%-28s%04X" % (name, i)
return Point.from_hash(data)
def from_hash(cls, entropy):
entropy = sha256(entropy).digest()
entropy_as_int = int.from_bytes(entropy, 'big')
y = FQ(entropy_as_int)
while True:
try:
p = cls.from_y(y)
except SquareRootError:
y += 1
continue
# ...
return p
This creates a hash for each base point sequence, then increments its big-endian interpretation until an X coordinate can be recovered from the Y. I think my implementation of from_hash
is more likely to recover an X coordinate with less average tries than if a new hash is generated at each try - but only marginally. The two implementations are similar, apart from the 'increment Y until matching X is found' part - which for Ethsnarks is designed to be Ethereum compatible (even though modulo square root probably isn't affordable on EVM).
The part which is necessary for Ethereum, IMO, is re-using the same components for MiMC round-constant generation and Pedersen hash base point generation. It's very cheap to implement the round_constants
mechanism described above, generating a hash chain using keccak256 from an initial seed is faster and cheaper than any alternative.
Regarding parallelism, I think it's fair to create a sequence of hashes which are used as starting points, while this isn't parallel on its own - it is fast to generate the sequence which can then be made parallel for every individual implementation.
IMO the way forward is to use the same round-constant generation mechanism for MiMC and Pedersen hash, for pedersen hash each constant in the sequence is the starting point for recovering the X from the Y, and for MiMC it's used verbatim.
As far as 'nothing up my sleeve' numbers go, for generating the base points the mechanism implemented by ethsnarks (where the Y coordinate is incremented until a resulting point is found) matches the literature and existing implementations, and the hash sequence relies more heavily on the random-oracle-model of non-determinism.
To use the same sequence generator across EVM and Pedersen hash, the constraints are:
p
, rather than bit masking or truncation, to move the hash output onto the scalar fieldFor pedersen hash base point generation:
TODO:
So, I should've originally listened to Daira, as she's a beacon of wisdom and insight, and further research has shown that MiMC-p/p with 12 rounds is very insecure.
This ticket aims to address the shortcomings of the current implementation and making it 'secure' with some small tweaks.
Exponent of 5 is a bijection over F_p for the altbn scalar field used by Ethereum, need to verify if exponent of 7 is a bijection also.
With
exponent=7
, each round requires 4 constraints for exponentiation:Where 91 rounds requires 364 constraints when using a round-constants.
The round constants for the keys can be generated as a hash stream using keccak256.
Any personalisation must change the initial seed, to make it derive a unique sequence of round constants. The constants are used to personalise the use of the key.
Note that for the first round the constant is zero, and for the last round the key is appended to the result, this matches the following implementations and is required to ensure the result is bijective:
Pseudocode:
The Miyaguchi–Preneel one-way construct can be implemented as a linear combination between rounds, returning a linear combination with the previous result, requiring no additional constraints.