Open jon-chuang opened 4 years ago
If this helps, here are my Sage scripts for GLV acceleration on BN254-Snarks and BLS12-381:
I've taken the coefficient from the Guide to pairing based cryptography when possible but I did not get proper results on G2 as noted in the code.
Here is a generic way to regenerate coefficient for any curve, I couldn't get LLL to work in a symbolic manner with ZZ["x"] https://github.com/mratsim/constantine/blob/ac37b55a/sage/lattice_decomposition_finder.sage
# Example of BLS12-381 with the ψ (Psi) - Untwist-Frobenius-Twist endomorphism
x = -(2^63 + 2^62 + 2^60 + 2^57 + 2^48 + 2^16)
p = (x - 1)^2 * (x^4 - x^2 + 1)//3 + x
r = x^4 - x^2 + 1
t = x + 1 # Trace of Frobenius
lambda_psi = t - 1
Lpsi = Matrix([
[ r, 0, 0, 0],
[-lambda_psi, 1, 0, 0],
[-lambda_psi^2, 0, 1, 0],
[-lambda_psi^3, 0, 0, 1],
])
Lpsi = Lpsi.LLL()
print(Lpsi)
ahat = vector([r, 0, 0, 0]) * Lpsi.inverse()
print('ahat: ' + str(ahat))
v = int(r).bit_length()
v = int(((v + 64 - 1) // 64) * 64)
print([(a << v) // r for a in ahat])
Note that my GLV scalar mul is constant-time and follows
Some later papers (FourQ and Exponentiation in Pairing Group) and the FourQ CFRG draft also mentioned alternative to the GLV-SAC recoding with larger window size but I did not explore that yet.
And I just added BLS12-377 support so here is my Sage script:
And this is the speedup I get This is tested with a scalar that is the same bitwidth as the curve order (so 253 bits on BLS12-377)
I need to rework my towering for G2, probably in pure assembly and change the coordinates from projective to jacobian as the compiler introduce a 10% perf tax for no reason in Fp2 (push-poping stack for aliasing?) and the complete projective formulas have a 40% perf tax, and even more with a quadratic non-residue like √-5 for BLS12-377. So you can get significantly improved figures for non constant-time formulas.
@mratsim how do you know which phi corresponds to which lambda a priori?
I don't :P
I have generalized my generator script by the way, it can now handle BN254-Snarks, BN254-Nogami, BLS12-377, BLS12-381 and BW6-761: https://github.com/mratsim/constantine/blob/244f583/sage/derive_endomorphisms.sage
In particular the logic for finding phi <=> lambda match is just try/except
def check_cubic_root_endo(G1, Fp, r, cofactor, lambdaR, phiP):
## Check the Endomorphism for p mod 3 == 1
## Endomorphism can be field multiplication by one of the non-trivial cube root of unity 𝜑
## Rationale:
## curve equation is y² = x³ + b, and y² = (x𝜑)³ + b <=> y² = x³ + b (with 𝜑³ == 1) so we are still on the curve
## this means that multiplying by 𝜑 the x-coordinate is equivalent to a scalar multiplication by some λᵩ
## with λᵩ² + λᵩ + 1 ≡ 0 (mod r) and 𝜑² + 𝜑 + 1 ≡ 0 (mod p), see below.
## Hence we have a 2 dimensional decomposition of the scalar multiplication
## i.e. For any [s]P, we can find a corresponding [k1]P + [k2][λᵩ]P with [λᵩ]P being a simple field multiplication by 𝜑
## Finding cube roots:
## x³−1=0 <=> (x−1)(x²+x+1) = 0, if x != 1, x solves (x²+x+1) = 0 <=> x = (-1±√3)/2
assert phiP^3 == Fp(1)
assert lambdaR^3 % r == 1
Prand = G1.random_point()
P = Prand * cofactor
assert P != G1([0, 1, 0])
(Px, Py, Pz) = P
Qendo = G1([Px*phiP, Py, Pz])
Qlambda = lambdaR * P
assert P != Qendo
assert P != Qlambda
assert Qendo == Qlambda
print('Endomorphism OK')
def genCubicRootEndo(curve_name, curve_config):
p = curve_config[curve_name]['field']['modulus']
r = curve_config[curve_name]['field']['order']
b = curve_config[curve_name]['curve']['b']
print('Constructing G1')
Fp = GF(p)
G1 = EllipticCurve(Fp, [0, b])
print('Computing cofactor')
cofactor = G1.order() // r
# slow for large inputs - https://pari.math.u-bordeaux.fr/archives/pari-dev-0412/msg00020.html
if curve_name != 'BW6_761':
print('Finding cube roots')
(phi1, phi2) = (Fp(root) for root in Fp(1).nth_root(3, all=True) if root != 1)
(lambda1, lambda2) = (GF(r)(root) for root in GF(r)(1).nth_root(3, all=True) if root != 1)
else:
print('Skip finding cube roots for BW6_761, too slow, use values from paper https://eprint.iacr.org/2020/351')
phi1 = Integer('0x531dc16c6ecd27aa846c61024e4cca6c1f31e53bd9603c2d17be416c5e4426ee4a737f73b6f952ab5e57926fa701848e0a235a0a398300c65759fc45183151f2f082d4dcb5e37cb6290012d96f8819c547ba8a4000002f962140000000002a')
phi2 = Integer('0xcfca638f1500e327035cdf02acb2744d06e68545f7e64c256ab7ae14297a1a823132b971cdefc65870636cb60d217ff87fa59308c07a8fab8579e02ed3cddca5b093ed79b1c57b5fe3f89c11811c1e214983de300000535e7bc00000000060')
lambda1 = Integer('0x9b3af05dd14f6ec619aaf7d34594aabc5ed1347970dec00452217cc900000008508c00000000001')
lambda2 = Integer('-0x9b3af05dd14f6ec619aaf7d34594aabc5ed1347970dec00452217cc900000008508c00000000002')
print('𝜑1 (mod p): 0x' + Integer(phi1).hex())
print('λᵩ1 (mod r): 0x' + Integer(lambda1).hex())
print('𝜑2 (mod p): 0x' + Integer(phi2).hex())
print('λᵩ2 (mod r): 0x' + Integer(lambda2).hex())
# TODO: is there a better way than spray-and-pray?
# TODO: Should we maximize or minimize lambda
# to maximize/minimize the scalar norm?
# TODO: Or is there a way to ensure
# that the Babai basis is mostly positive?
if lambda1 < lambda2:
lambda1, lambda2 = lambda2, lambda1
try:
check_cubic_root_endo(G1, Fp, r, cofactor, int(lambda1), phi1)
except:
print('Failure with:')
print(' 𝜑 (mod p): 0x' + Integer(phi1).hex())
print(' λᵩ (mod r): 0x' + Integer(lambda1).hex())
phi1, phi2 = phi2, phi1
check_cubic_root_endo(G1, Fp, r, cofactor, int(lambda1), phi1)
finally:
print('Success with:')
print(' 𝜑 (mod p): 0x' + Integer(phi1).hex())
print(' λᵩ (mod r): 0x' + Integer(lambda1).hex())
print('Deriving Lattice')
lattice = derive_lattice(r, lambda1, 2)
pretty_print_lattice(lattice)
print('Deriving Babai basis')
babai = derive_babai(r, lattice, 2)
pretty_print_babai(babai)
return phi1, lattice, babai
@mratsim Is it normal to get a lattice basis where one of the vectors is 0? The speedup still seems to exist and the evaluations are correct.
Yes, I re-derive the whole thing with LLL for generality and because I had some trouble with some of the Guide to Pairing-based cryptography lattices but you can have 0 coefficients (not the full vector though):
Hmm actually I meant the entire lattice basis vector being zero. Alternatively, being very close to zero.
That seems strange, what curve did you use?
For example my lattice for BLS377
And BW6-761
Actually, I discovered that this produces very suboptimal decomposition that does not respect the < sqrt(n) property. So I used a different lattice instead.
That's strange, for which curve? I had that issue on my previous BLS Lattice, but I checked this one and it seemed good.
That said I didn't implement the technique from secp256k1 to create scalars that reach the max infinity norm and verify the ceil(log2(r)/decomposition_degree) + 1 property https://github.com/bitcoin-core/secp256k1/pull/822#issuecomment-699462104
GLV patents end 25 sept. 2 weeks and 2 days. We should collate resources/scripts for finding lambda and omega for all the SW curves Zexe uses, so that the current GLV script can produce the precomputes necessary for GLV scalar mul, which is not constant time either, but combined with wNAF produces a speedup of over 2x for BW6 projective. Batch affine alone only adds about 1.3-1.4x on top of this.
GLV precomputation script should also be generic and work for all curves based on features. We can use macros to simplify.