crate-crypto / go-ipa

A Go implementation of cryptographic primitives for Verkle Trees
https://verkle.dev
Apache License 2.0
32 stars 14 forks source link

multiproof/verifier: faster g_2(x) and smaller inverse batching #49

Closed jsign closed 1 year ago

jsign commented 1 year ago

This PR makes some changes to avoid computations in the multiproof verification.

The high-level idea is similar to https://github.com/crate-crypto/go-ipa/pull/48 (i.e: aggregating polynomials per evaluation point first and then doing the rest of the work).

The problem with master is that we’re calculating num_queries elements r^i/(t - z_i). Although we’re now doing all those inverses in a single batch, there’re a lot of repeated 1/(t - z_i).

As in, if we’re verifying a multiproof for 16k openings, clearly 16k 1/(t - z_i) involves a lot of repeated domain elements. So the cost here isn’t really “inversions” since we’re batching them with the Montgomery trick. Still, the Montgomery trick also involves a linear time of multiplications to “clear out” the inverses we’re looking for, so we’re doing a lot of necessary multiplications.

In this PR, we first calculate the polynomials per evaluation point, and then jump into this “inverse” work. Since we aggregated by evaluation point, we won’t be doing unnecessary work in the inverse batching of 1/(t - z_i) since there won’t be repeated elements. This means that we switch from O(num_queries) multiplications to O(1) (since we can have up to 256 inverses to calculate here in the worst case).

This aggregation also implies that we switch from O(num_queries) operations when calculating g_2(t), to O(1). The reason is the same, we have up to 256 polynomials to sum to calculate g_2(t) now.

So this means that we changed two O(num_queries) with two O(1). Now, this PR has a slight downside: it has to do some extra work to calculate E. That adds O(num_queries) extra work that didn’t exist in master.

So the bottom line is that we saved two O(num_queries) and we added one, resulting in saving around one O(num_queries) of work (maybe there’s another trick to avoid this work?), which is the explanation for why this PR makes this faster.

Hopefully, this wasn’t too confusing. I’ve already seen these savings in benchmark numbers, but you’d have to trust me for that :P. So, here I’m giving a more theoretical explanation of where those savings come from.