Sponge hash for GF(p) - Githubissues

The goal of this research is to identify a cryptographically secure hash function which can be used to authenticate a Merkle tree path and to construct a Merkle tree.

As per:

We start with the knowledge that, for the curve order for altBN as used by the zkSNARK implementation, an exponent of 5 creates a bijection. Secondly, any bijection over a prime field satisfies the pigeonhole principal of even distribution for any number of inputs. This is our one-way function: ƒ(x) = x^5. We then define a specialisation ƒs(x) = (x+s)^5 where s is a unique and randomly chosen constant where 0 < s < p.

In 2 § 1.1.3 they make most siblings hard by defining a hash function which takes one s per x: h'_{s_1...s_n}(x_1...x_n) = h_s_1(x_1) * ... * h_s_n(x_n) - however in the case where x is the size of a field element it is possible to inject the modular inverse for the previous step to bring the state back to 0, and the next element (in a merkle path proof) to negate the next element etc. which would create an easily malleable construction that could produce an arbitrary output with little to no computation time. For this reason the range of user input must be significantly limited.

I am proposing that the permutation function described above be used in conjunction with the sponge construction³ to create a hashing function which is both collision resistant and has low multiplicative complexity.

The function operates on 4-bit chunks of a 508 bit input. It compresses two field elements into one in 127+1 rounds.

p = py_ecc.bn128.curve_order

def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in range(0, len(l), n))

def ShortsightH_round(b, L, R, C):
    return R + powmod(b + C, 5, p), L

def ShortsightH(x):
    assert len(x) == 508
    L = 0
    R = 0
    for i, b in enumerate(chunk(x, 4)):
        b = int(''.join(map(str, b)), 2)
        C = sha3("ShortsightH/"+str(p)+"/508/4/" + str(i))
        L, R = ShortsightH_round(b, L, R, C)
    C = sha3("ShortsightH/"+str(p)+"/508/4/final")
    L, R = ShortsightH_round(0, L, R, C)
    return L

However, this is less efficient to compute in Solidity/EVM which could make updating the merkle tree more expensive (as bitwise operations are expensive), but given the number of bits at a time is 4 we can do a simple integer division by 16 to do the equivalent of x = x >> 4, then extract the lowest 4 bits using x & 15. The problem comes with the overlap for 2x 256 bit words where the last 2 bits of the first word become the first 2 bits of the second word - but - this can be easily handled with a multiply by 4.

The most relevant analysis of a function similar to this would be the the Cryptanalasys of IOTA's Curl hash function, but in this case the analysis doesn't directly translate. The question becomes one of:

Given z = (x + (y + C)^5) mod p, knowing C and z how do you determine x and y. Where y is a whole field element in size this is a difficult problem, but when the size of y is small, say 4 bits, you can trivially compute 16 different possible (x, y) values that satisfy the equation by doing x = z - (n + C)^5 for n in 0..15. From this we can work backwards to brute-force a set of inputs which result in the same hash in linear time of O(r^16) where r is the number of absorb rounds - which means it's fundamentally broken.

This puts us between a rock and a hard place when using addition to within the round function, if we were to change the round function to the following, then we have a function which according to 2 § 1.1.3 makes most siblings hard:

def ShortsightH_round(b, L, R, C):
    return (R * powmod(b + C, 5, p)) % p, L

From there we now have something which:

Is tied closer to the theoretical proof from 2
Is resistant to malleability due to the limited input size
Increases in complexity at every round

The only problem is the second to last round can be derived from the known constant C as b is 0, as you can compute the modulo inverse of C^5 to find R from the previous round. But then, what stops you from working backwards using a linear brute force approach of L_{i-1} = R_i * modinv((b_i+C_i)^5) to find the previous round?

From this the conclusions I can come to are:

MiMC when used in a Sponge construction on a bit-by-bit basis, is fundamentally flawed.
LongsightF when used with large L and R inputs doesn't have this flaw
ShortsightH, when used with the modification above on 4bit inputs has this flaw

Essentially the problem is if you take a single field element as the output then it's possible to work backwards when using the sponge construction, but when absorbing 4 bits at a time and emitting 4 bits at a time (where each field element is truncated to 4 bits) then you have something more robust.

I'm going to leave LongsightF as-is, as it's an adequate solution for now and serves my purposes.

HarryR / ethsnarks

Sponge hash for GF(p) #32