arkworks-rs / algebra

Libraries for finite field, elliptic curve, and polynomial arithmetic
https://arkworks.rs
Apache License 2.0
610 stars 240 forks source link

Extend GLV to BLS/BN/MNT curves #55

Open jon-chuang opened 4 years ago

jon-chuang commented 4 years ago

GLV patents end 25 sept. 2 weeks and 2 days. We should collate resources/scripts for finding lambda and omega for all the SW curves Zexe uses, so that the current GLV script can produce the precomputes necessary for GLV scalar mul, which is not constant time either, but combined with wNAF produces a speedup of over 2x for BW6 projective. Batch affine alone only adds about 1.3-1.4x on top of this.

GLV precomputation script should also be generic and work for all curves based on features. We can use macros to simplify.

mratsim commented 3 years ago

If this helps, here are my Sage scripts for GLV acceleration on BN254-Snarks and BLS12-381:

I've taken the coefficient from the Guide to pairing based cryptography when possible but I did not get proper results on G2 as noted in the code.

Here is a generic way to regenerate coefficient for any curve, I couldn't get LLL to work in a symbolic manner with ZZ["x"] https://github.com/mratsim/constantine/blob/ac37b55a/sage/lattice_decomposition_finder.sage

# Example of BLS12-381 with the ψ (Psi) - Untwist-Frobenius-Twist endomorphism
x = -(2^63 + 2^62 + 2^60 + 2^57 + 2^48 + 2^16)
p = (x - 1)^2 * (x^4 - x^2 + 1)//3 + x
r = x^4 - x^2 + 1
t = x + 1 # Trace of Frobenius

lambda_psi = t - 1

Lpsi = Matrix([
    [          r,   0, 0, 0],
    [-lambda_psi,   1, 0, 0],
    [-lambda_psi^2, 0, 1, 0],
    [-lambda_psi^3, 0, 0, 1],
])

Lpsi = Lpsi.LLL()
print(Lpsi)

ahat = vector([r, 0, 0, 0]) * Lpsi.inverse()
print('ahat: ' + str(ahat))

v = int(r).bit_length()
v = int(((v + 64 - 1) // 64) * 64)
print([(a << v) // r for a in ahat])

Note that my GLV scalar mul is constant-time and follows

Some later papers (FourQ and Exponentiation in Pairing Group) and the FourQ CFRG draft also mentioned alternative to the GLV-SAC recoding with larger window size but I did not explore that yet.

mratsim commented 3 years ago

And I just added BLS12-377 support so here is my Sage script:

And this is the speedup I get This is tested with a scalar that is the same bitwidth as the curve order (so 253 bits on BLS12-377)

G1

image

G2

image

Perf notes

I need to rework my towering for G2, probably in pure assembly and change the coordinates from projective to jacobian as the compiler introduce a 10% perf tax for no reason in Fp2 (push-poping stack for aliasing?) and the complete projective formulas have a 40% perf tax, and even more with a quadratic non-residue like √-5 for BLS12-377. So you can get significantly improved figures for non constant-time formulas.

jon-chuang commented 3 years ago

@mratsim how do you know which phi corresponds to which lambda a priori?

mratsim commented 3 years ago

I don't :P

I have generalized my generator script by the way, it can now handle BN254-Snarks, BN254-Nogami, BLS12-377, BLS12-381 and BW6-761: https://github.com/mratsim/constantine/blob/244f583/sage/derive_endomorphisms.sage

In particular the logic for finding phi <=> lambda match is just try/except

def check_cubic_root_endo(G1, Fp, r, cofactor, lambdaR, phiP):
  ## Check the Endomorphism for p mod 3 == 1
  ## Endomorphism can be field multiplication by one of the non-trivial cube root of unity 𝜑
  ##   Rationale:
  ##     curve equation is y² = x³ + b, and y² = (x𝜑)³ + b <=> y² = x³ + b (with 𝜑³ == 1) so we are still on the curve
  ##     this means that multiplying by 𝜑 the x-coordinate is equivalent to a scalar multiplication by some λᵩ
  ##     with λᵩ² + λᵩ + 1 ≡ 0 (mod r) and 𝜑² + 𝜑 + 1 ≡ 0 (mod p), see below.
  ##     Hence we have a 2 dimensional decomposition of the scalar multiplication
  ##     i.e. For any [s]P, we can find a corresponding [k1]P + [k2][λᵩ]P with [λᵩ]P being a simple field multiplication by 𝜑
  ##   Finding cube roots:
  ##      x³−1=0 <=> (x−1)(x²+x+1) = 0, if x != 1, x solves (x²+x+1) = 0 <=> x = (-1±√3)/2

  assert phiP^3 == Fp(1)
  assert lambdaR^3 % r == 1

  Prand = G1.random_point()
  P = Prand * cofactor
  assert P != G1([0, 1, 0])

  (Px, Py, Pz) = P

  Qendo = G1([Px*phiP, Py, Pz])
  Qlambda = lambdaR * P

  assert P != Qendo
  assert P != Qlambda

  assert Qendo == Qlambda
  print('Endomorphism OK')

def genCubicRootEndo(curve_name, curve_config):
  p = curve_config[curve_name]['field']['modulus']
  r = curve_config[curve_name]['field']['order']
  b = curve_config[curve_name]['curve']['b']

  print('Constructing G1')
  Fp = GF(p)
  G1 = EllipticCurve(Fp, [0, b])
  print('Computing cofactor')
  cofactor = G1.order() // r

  # slow for large inputs - https://pari.math.u-bordeaux.fr/archives/pari-dev-0412/msg00020.html
  if curve_name != 'BW6_761':
    print('Finding cube roots')
    (phi1, phi2) = (Fp(root) for root in Fp(1).nth_root(3, all=True) if root != 1)
    (lambda1, lambda2) = (GF(r)(root) for root in GF(r)(1).nth_root(3, all=True) if root != 1)
  else:
    print('Skip finding cube roots for BW6_761, too slow, use values from paper https://eprint.iacr.org/2020/351')
    phi1 = Integer('0x531dc16c6ecd27aa846c61024e4cca6c1f31e53bd9603c2d17be416c5e4426ee4a737f73b6f952ab5e57926fa701848e0a235a0a398300c65759fc45183151f2f082d4dcb5e37cb6290012d96f8819c547ba8a4000002f962140000000002a')
    phi2 = Integer('0xcfca638f1500e327035cdf02acb2744d06e68545f7e64c256ab7ae14297a1a823132b971cdefc65870636cb60d217ff87fa59308c07a8fab8579e02ed3cddca5b093ed79b1c57b5fe3f89c11811c1e214983de300000535e7bc00000000060')
    lambda1 = Integer('0x9b3af05dd14f6ec619aaf7d34594aabc5ed1347970dec00452217cc900000008508c00000000001')
    lambda2 = Integer('-0x9b3af05dd14f6ec619aaf7d34594aabc5ed1347970dec00452217cc900000008508c00000000002')

  print('𝜑1 (mod p):  0x' + Integer(phi1).hex())
  print('λᵩ1 (mod r): 0x' + Integer(lambda1).hex())
  print('𝜑2 (mod p):  0x' + Integer(phi2).hex())
  print('λᵩ2 (mod r): 0x' + Integer(lambda2).hex())

  # TODO: is there a better way than spray-and-pray?
  # TODO: Should we maximize or minimize lambda
  #       to maximize/minimize the scalar norm?
  # TODO: Or is there a way to ensure
  #       that the Babai basis is mostly positive?
  if lambda1 < lambda2:
    lambda1, lambda2 = lambda2, lambda1

  try:
    check_cubic_root_endo(G1, Fp, r, cofactor, int(lambda1), phi1)
  except:
    print('Failure with:')
    print('  𝜑 (mod p): 0x' + Integer(phi1).hex())
    print('  λᵩ (mod r): 0x' + Integer(lambda1).hex())
    phi1, phi2 = phi2, phi1
    check_cubic_root_endo(G1, Fp, r, cofactor, int(lambda1), phi1)
  finally:
    print('Success with:')
    print('  𝜑 (mod p):  0x' + Integer(phi1).hex())
    print('  λᵩ (mod r): 0x' + Integer(lambda1).hex())

  print('Deriving Lattice')
  lattice = derive_lattice(r, lambda1, 2)
  pretty_print_lattice(lattice)

  print('Deriving Babai basis')
  babai = derive_babai(r, lattice, 2)
  pretty_print_babai(babai)

  return phi1, lattice, babai
jon-chuang commented 3 years ago

@mratsim Is it normal to get a lattice basis where one of the vectors is 0? The speedup still seems to exist and the evaluations are correct.

mratsim commented 3 years ago

Yes, I re-derive the whole thing with LLL for generality and because I had some trouble with some of the Guide to Pairing-based cryptography lattices but you can have 0 coefficients (not the full vector though):

image

jon-chuang commented 3 years ago

Hmm actually I meant the entire lattice basis vector being zero. Alternatively, being very close to zero.

mratsim commented 3 years ago

That seems strange, what curve did you use?

For example my lattice for BLS377 image

And BW6-761 image

jon-chuang commented 3 years ago

Actually, I discovered that this produces very suboptimal decomposition that does not respect the < sqrt(n) property. So I used a different lattice instead.

mratsim commented 3 years ago

That's strange, for which curve? I had that issue on my previous BLS Lattice, but I checked this one and it seemed good.

That said I didn't implement the technique from secp256k1 to create scalars that reach the max infinity norm and verify the ceil(log2(r)/decomposition_degree) + 1 property https://github.com/bitcoin-core/secp256k1/pull/822#issuecomment-699462104