a16z / Spartan2

High-speed zkSNARKs
MIT License
2 stars 3 forks source link

Optimization: compute_eval_table_sparse + combine_evals #1

Open sragss opened 9 months ago

sragss commented 9 months ago

https://github.com/a16z/Spartan2/blob/uniform_r1cs_shape/src/spartan/snark.rs#L238-L285

The compute_eval_table_sparse lambda and the following combination of the evaluations comprise ~12.5% of the end-to-end Spartan2 flow.

I see three primary performance bugs.

sragss commented 9 months ago

Additionally I don't think that we need to write out the zero-ed A_evals, B_evals and C_evals before computing the weighted sum.

Zero allocation appears to cost 30-50% of the total looptime.

sragss commented 9 months ago

Here is the density map of the A, B, C matrices for the jolt_single_step circuit:

A: ==================== 14139658
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 7336615 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 3868397 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efffff01, Count: 666965 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000000010000, Count: 533572 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000100, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000001000000, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000000001000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efff0001, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593effffffe, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000009
Value: 0x0000000000000000000000000000000000000000000000000000000000000002, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593ef000001, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000001000000000000
Value: 0x0000000000000000000000000000000000000000000000000001000000000000, Count: 133393 sq: 0x0000000000000000000000000000000000000001000000000000000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efffffff, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x0000000000000000000000000000000000000000000000000000000100000000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000010000000000000000

B: ==================== 8670545
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 7203222 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 666965 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000100000000, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000010000000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593effffffd, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010
Value: 0x0000000000000000000000000000000000000000000000000000000000001000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000001000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000004, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010

C: ==================== 10671440
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 5469113 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 3068039 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000400, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000100000
Value: 0x0000000000000000000000000000000000000000000000000000000000008000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000040000000
Value: 0x0000000000000000000000000000000000000000000000000000000000010000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x0000000000000000000000000000000000000000000000000000000000004000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000010000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000080, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000004000
Value: 0x0000000000000000000000000000000000000000000000000000000000000040, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000001000
Value: 0x0000000000000000000000000000000000000000000000000000000000000100, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000000000800, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000400000
Value: 0x0000000000000000000000000000000000000000000000000000000000001000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000001000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000020, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000400
Value: 0x0000000000000000000000000000000000000000000000000000000000000004, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010
Value: 0x0000000000000000000000000000000000000000000000000000000000002000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000004000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000002, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x0000000000000000000000000000000000000000000000000000000000000010, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000100
Value: 0x0000000000000000000000000000000000000000000000000000000000000200, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000040000
Value: 0x0000000000000000000000000000000000000000000000000000000000000008, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000040
sragss commented 9 months ago

Guessing this would be faster if we could order by pk.S.{A,B,C} by (col, row) rather than (row, col). This would allow splitting the iterator such that each chunk largely access only its own columns. Only minimal contention at the edges.

sragss commented 9 months ago

Srinath does not think the code depends on pk.S.{A,B,C} (row, col) ordering, suggesting we should attempt a change to (col, row) then check if memory contention on parallel compute_eval_table_sparse improves.