Closed vbuterin closed 4 years ago
Do you have any metrics/measurements on how this effects performance?
Quick test on my laptop shows it reduces the time of a final exponentiation from 1.10 sec to 0.34 sec.
Changed to your preferred zip format btw; to answer your performance question there, the difference between list and zip is negligible as the bulk of the complexity is in the field operations and not the scaffolding.
Here's what I understand from this pull request so far. I haven't figured out why the linearity works.
We are saving the cost from 6438 multiplications to 2000 multiplications with 1 division now.
In the final_exponentiate, we raise the input FQ12 to a large power of an integer. This requires many multiplications of FQ12.
def final_exponentiate_old(p: FQ12) -> FQ12:
return p ** ((field_modulus ** 12 - 1) // curve_order)
The square-and-multiply strategy is used to raise the FQ12 to a power. Let's say if we want to calculate x^9, we (x^8) x, then we get x^8 by (x^4) (x^4), then get x^4 by (x^2) * (x^2), so on and so forth.
def __pow__(self: T_FQ, other: int) -> T_FQ:
if other == 0:
return type(self)(1)
elif other == 1:
return type(self)(self.n)
elif other % 2 == 0:
return (self * self) ** (other // 2)
else:
return ((self * self) ** int(other // 2)) * self
To get the number of multiplications required for a certain power n
, we can turn n
into a binary. For every digit 0 in the binary, we need one multiplication for the square; and for every digit 1, we need one for the square and one for the multiply.
def multiplications_required(n):
return sum(int(i) + 1 for i in bin(n)[2:])
To do the final exponentiate we need 6438 multiplications.
from py_ecc.fields.field_properties import field_properties
from py_ecc.optimized_bls12_381.optimized_curve import curve_order
field_modulus = field_properties["bls12_381"]["field_modulus"]
exp = ((field_modulus ** 12 - 1) // curve_order)
print(multiplications_required(exp))
# 6438
What exp_by_p brings to the table (sorry :)) is it makes raising to a power of field_modulus
cheap. Naive square-and-multiply needs 611 multiplications but with a precomputed exptable
only 12 multiplications are required now.
print(multiplications_required(field_modulus))
# 611
Note that x^12 - 1
= (x^6 +1)(x^6 - 1)
= (x^2 +1)(x^4 - x^2 + 1)(x^6 - 1)
The variable p2 gets us p**(x^2 +1)
, the p3 gets us p2 **(x^6 - 1)
, and the last expression gets p3**(x^4 - x^2 + 1)
def final_exponentiate(p: FQ12) -> FQ12:
cofactor = (field_modulus ** 4 - field_modulus ** 2 + 1) // curve_order
p2 = exp_by_p(exp_by_p(p)) * p
p3 = exp_by_p(exp_by_p(exp_by_p(exp_by_p(exp_by_p(exp_by_p(p2)))))) / p2
return p3 ** cofactor
We have 8 exp_by_p, so 8 * 12 = 96 multiplications. Raising to the power of the cofactor needs 1901 multiplications, which can be shown by running the script.
cofactor = (field_modulus ** 4 - field_modulus ** 2 + 1) // curve_order
print(multiplications_required(cofactor))
# 1901
We are also having 1 division that's slightly more expensive than 1 multiplication but negligible here. In total, we have almost 2k multiplications.
Here's a script
from py_ecc.fields.field_properties import field_properties
from py_ecc.fields import optimized_bls12_381_FQ12 as FQ12
from py_ecc.optimized_bls12_381.optimized_curve import curve_order
from py_ecc.optimized_bls12_381.optimized_pairing import final_exponentiate
import py_ecc.optimized_bls12_381.optimized_pairing
field_modulus = field_properties["bls12_381"]["field_modulus"]
def tick():
tick.counter += 1
# initialize
tick.counter = 0
class WrapFQ12(FQ12):
def __mul__(self, *args):
tick()
return super().__mul__(*args)
def __rmul__(self, *args):
tick()
return super().__rmul__(*args)
py_ecc.optimized_bls12_381.optimized_pairing.exptable = [
WrapFQ12([0] * i + [1] + [0] * (11 - i)) ** field_modulus for i in range(12)]
# reset
tick.counter = 0
a = WrapFQ12([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
final_exponentiate(a)
print("Optimized ", tick.counter)
# ------------------------
tick.counter = 0
def final_exponentiate_old(p):
return p ** ((field_modulus ** 12 - 1) // curve_order)
final_exponentiate_old(a)
print(tick.counter)
Great work on the explanation! :rofl: I agree with all of it.
Here's what I understand from this pull request so far. I haven't figured out why the linearity works.
Thanks 😂
Here's my understanding of the Freshman's dream
Let's denote a Fq12 element x = (x1, x2, ..., x12), where x1, ..., x12 are Fq we can represent x as a linear combination of the bases
x = x1 * (1, 0, ..., 0) + x2 * (0, 1, 0, ..., 0) + ... + x12 * (0, ...., 0, 1)
or we denote a basis as bi
x = x1 * b1 + x2 * b2 + ... + x12 * b12
Applying Freshman's dream (it's applicable because we have prime order and prime power?)
x^p = (x1 * b1)^p + (x2 * b2)^p + ... + (x12 * b12)^p
= x1^p * b1^p + x2^p * b2^p + ... + x12^p * b12^p
= x1 * b1^p + x2 * b2^p + ... + x12 * b12^p
Where all bi^p are the FQ12 values precomputed in the exptable. We also have xi^p = xi by Fermat's little theorem.
Yep! 100% correct.
Except this:
are the FQ2 values
FQ12 values.
Ah, nice catch! Fixed.
@vbuterin is there anything else you want to do with this PR prior to merge?
Take advantage of the linearity of
x -> x**p
to greatly improve efficiency of final exponentiation.