Closed jeanmon closed 1 week ago
The biggest difference is that that PR is parallelizing over rows and we parallelize over columns. We could see which one performs better, but I expect that for big enough polynomials, columns are better. Each thread gets a column, which is a contiguous array which plays well with caching etc, and performs a full memset on the whole range. If you do it row-wise, you'll be "jumping around" the contiguous array and also have shorter lived threads.
@fcarreiro I just tested a small change related to use constructor without the zero initialization (for non derivative entities) and unit tests are passing. I think we could just do that. I will submit a PR for the ticket.
As part of this PR https://github.com/AztecProtocol/aztec-packages/pull/10073, Mara introduced some parallelization in initializing memory with zeros.
Trying to call the parallel version inside of the avm circuit_builder created some issue as this part of the code is already parallel (calling parallel_for inside another one seems to lead to some issues).
The goal of this task should consist in investigating whether we can leverage the parallel polynomial initialization routine in the AVM. We might have to "un-parallelize" code in circuit_builder though.
Alternatively, we might consider not initializing the polynomials at all.
Tagging @fcarreiro @maramihali for awareness.