Closed aszepieniec closed 1 month ago
Consider renaming the NO_CACHE
environment variable to TVM_NO_CACHE
to avoid name collisions.
The benchmark parameters are too small to get a clear view of what's going on, and the RAM consumption is inconsistent for the cached version as the $2^{11}$ FRI domain length version uses more RAM than the $2^{11}$ version.
Consider trying with an padded height of $2^{19}$.
It is likely that parameters can be tweaked to improve speed or memory consumption of low-memory proving. Because there are currently no clear benchmarking targets in place, and because low-memory proving is not of the highest priority for the time being, such optimiziations are deferred.
Storing the entire low-degree-extended trace takes up a lot of memory. Just-in-time low-degree-extension is an alternative to this cached approach that computes (and in some cases recomputes) low-degree-extended columns or subtables when they are needed and drops them afterwards.
If enough RAM is available and the environment variable
NO_CACHE
is not set, the prover will use the faster cached workflow. If not enough RAM is available or the environment variableNO_CACHE
is set, then the prover will compute the low-degree extension just-in-time and avoid storing excessive data.Benchmarks
I ran some benchmarks on my machine for the two prover benchmark programs,
prove_halt
andprove_fib
. There is quite some error on these benchmarks.prove_halt
FRI domain length: $2^{11}$. Around 14% of the time of the JIT variant is spent in step
open_trace_leafs
.prove_fib
(input 100)FRI domain length: $2^{13}$. Around 44% of the time of the JIT variant is spent in step
open_trace_leafs
.prove_fib
(input 40_000) (on machinemjolnir
)FRI domain length: $2^{22}$. Around 42% of the time of the JIT variant is spent in step
open_trace_leafs
.Algorithmic Details
In all cases, we assume all column interpolants (randomized, and over the randomized trace domain) are available.
Opening Trace Rows
FRI terminates by supplying indices of rows in the low-degree-extended table that the prover must open. The prover computes these rows by parallelizing over all interpolants and applying multi-point evaluation. Note that we are currently using a multi-point evaluation algorithm that is known not to be optimal.
Committing to the Tables
The prover iterates over all columns in batches of
NUM_THREADS
, low-degree-extends them in parallel, and absorbs the yielded cells into a vector of sponge states.Computing the Quotient Segment Codewords
The prover evaluates the (singular) quotient on
NUM_SEGMENT
-many cosets of the trace domain. This step produces a table which then undergoes some transformations:NUM_SEGMENTS
on all rowsThe result is the evaluations on one coset of the trace domain of
NUM_SEGMENT
-many quotient segment polynomials that match with the interleaving split of the high-degree singular quotient polynomial. These codewords are interpolated with INTT and then evaluated with NTT on the FRI domain for Merkle tree commitment.