As in, if we’re verifying a multiproof for 16k openings, clearly 16k 1/(t - z_i) involves a lot of repeated domain elements. So the cost here isn’t really “inversions” since we’re batching them with the Montgomery trick. Still, the Montgomery trick also involves a linear time of multiplications to “clear out” the inverses we’re looking for, so we’re doing a lot of necessary multiplications.
In this PR, we first calculate the polynomials per evaluation point, and then jump into this “inverse” work. Since we aggregated by evaluation point, we won’t be doing unnecessary work in the inverse batching of 1/(t - z_i) since there won’t be repeated elements. This means that we switch from O(num_queries) multiplications to O(1) (since we can have up to 256 inverses to calculate here in the worst case).
So this means that we changed two O(num_queries) with two O(1).
Now, this PR has a slight downside: it has to do some extra work to calculate E. That adds O(num_queries) extra work that didn’t exist in master.
So the bottom line is that we saved two O(num_queries) and we added one, resulting in saving around one O(num_queries) of work (maybe there’s another trick to avoid this work?), which is the explanation for why this PR makes this faster.
Hopefully, this wasn’t too confusing. I’ve already seen these savings in benchmark numbers, but you’d have to trust me for that :P. So, here I’m giving a more theoretical explanation of where those savings come from.
This PR makes some changes to avoid computations in the multiproof verification.
The high-level idea is similar to https://github.com/crate-crypto/go-ipa/pull/48 (i.e: aggregating polynomials per evaluation point first and then doing the rest of the work).
The problem with
master
is that we’re calculatingnum_queries
elementsr^i/(t - z_i)
. Although we’re now doing all those inverses in a single batch, there’re a lot of repeated1/(t - z_i)
.As in, if we’re verifying a multiproof for 16k openings, clearly 16k
1/(t - z_i)
involves a lot of repeated domain elements. So the cost here isn’t really “inversions” since we’re batching them with the Montgomery trick. Still, the Montgomery trick also involves a linear time of multiplications to “clear out” the inverses we’re looking for, so we’re doing a lot of necessary multiplications.In this PR, we first calculate the polynomials per evaluation point, and then jump into this “inverse” work. Since we aggregated by evaluation point, we won’t be doing unnecessary work in the inverse batching of
1/(t - z_i)
since there won’t be repeated elements. This means that we switch fromO(num_queries)
multiplications toO(1)
(since we can have up to 256 inverses to calculate here in the worst case).This aggregation also implies that we switch from
O(num_queries)
operations when calculatingg_2(t)
, toO(1)
. The reason is the same, we have up to 256 polynomials to sum to calculateg_2(t)
now.So this means that we changed two
O(num_queries)
with twoO(1)
. Now, this PR has a slight downside: it has to do some extra work to calculateE
. That addsO(num_queries)
extra work that didn’t exist in master.So the bottom line is that we saved two
O(num_queries)
and we added one, resulting in saving around oneO(num_queries)
of work (maybe there’s another trick to avoid this work?), which is the explanation for why this PR makes this faster.Hopefully, this wasn’t too confusing. I’ve already seen these savings in benchmark numbers, but you’d have to trust me for that :P. So, here I’m giving a more theoretical explanation of where those savings come from.