Open sragss opened 9 months ago
Additionally I don't think that we need to write out the zero-ed A_evals, B_evals and C_evals before computing the weighted sum.
Zero allocation appears to cost 30-50% of the total looptime.
Here is the density map of the A, B, C matrices for the jolt_single_step circuit:
A: ==================== 14139658
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 7336615 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 3868397 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efffff01, Count: 666965 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000000010000, Count: 533572 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000100, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000001000000, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000000001000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efff0001, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593effffffe, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000009
Value: 0x0000000000000000000000000000000000000000000000000000000000000002, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593ef000001, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000001000000000000
Value: 0x0000000000000000000000000000000000000000000000000001000000000000, Count: 133393 sq: 0x0000000000000000000000000000000000000001000000000000000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efffffff, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x0000000000000000000000000000000000000000000000000000000100000000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000010000000000000000
B: ==================== 8670545
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 7203222 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 666965 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000100000000, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000010000000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593effffffd, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010
Value: 0x0000000000000000000000000000000000000000000000000000000000001000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000001000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000004, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010
C: ==================== 10671440
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 5469113 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 3068039 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000400, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000100000
Value: 0x0000000000000000000000000000000000000000000000000000000000008000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000040000000
Value: 0x0000000000000000000000000000000000000000000000000000000000010000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x0000000000000000000000000000000000000000000000000000000000004000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000010000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000080, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000004000
Value: 0x0000000000000000000000000000000000000000000000000000000000000040, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000001000
Value: 0x0000000000000000000000000000000000000000000000000000000000000100, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000000000800, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000400000
Value: 0x0000000000000000000000000000000000000000000000000000000000001000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000001000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000020, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000400
Value: 0x0000000000000000000000000000000000000000000000000000000000000004, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010
Value: 0x0000000000000000000000000000000000000000000000000000000000002000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000004000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000002, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x0000000000000000000000000000000000000000000000000000000000000010, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000100
Value: 0x0000000000000000000000000000000000000000000000000000000000000200, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000040000
Value: 0x0000000000000000000000000000000000000000000000000000000000000008, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000040
Guessing this would be faster if we could order by pk.S.{A,B,C}
by (col, row) rather than (row, col). This would allow splitting the iterator such that each chunk largely access only its own columns. Only minimal contention at the edges.
Srinath does not think the code depends on pk.S.{A,B,C}
(row, col)
ordering, suggesting we should attempt a change to (col, row)
then check if memory contention on parallel compute_eval_table_sparse
improves.
https://github.com/a16z/Spartan2/blob/uniform_r1cs_shape/src/spartan/snark.rs#L238-L285
The compute_eval_table_sparse lambda and the following combination of the evaluations comprise ~12.5% of the end-to-end Spartan2 flow.
I see three primary performance bugs.
a + r * b + r * r * c
can be inlined such that the valuesa / b / c
are used directly after computation rather than storing, throwing away, and reloading from RAM.