Just-In-Time Low-Degree-Extension

aszepieniec commented 2 months ago

Storing the entire low-degree-extended trace takes up a lot of memory. Just-in-time low-degree-extension is an alternative to this cached approach that computes (and in some cases recomputes) low-degree-extended columns or subtables when they are needed and drops them afterwards.

If enough RAM is available and the environment variable NO_CACHE is not set, the prover will use the faster cached workflow. If not enough RAM is available or the environment variable NO_CACHE is set, then the prover will compute the low-degree extension just-in-time and avoid storing excessive data.

Benchmarks

I ran some benchmarks on my machine for the two prover benchmark programs, prove_halt and prove_fib. There is quite some error on these benchmarks.

`prove_halt`

	cache	jit	quotient
time	434.76 ms	513.72 ms	+18.16%
RAM	593080 kB	51780 kB	-91%

FRI domain length: $2^{11}$. Around 14% of the time of the JIT variant is spent in step open_trace_leafs.

`prove_fib` (input 100)

	cache	jit	quotient
time	280.28 ms	520.92 ms	+85.86%
RAM	120312 kB	82224 kB	-31.66%

FRI domain length: $2^{13}$. Around 44% of the time of the JIT variant is spent in step open_trace_leafs.

`prove_fib` (input 40_000) (on machine `mjolnir`)

	cache	jit	quotient
time	46.06 s	93.63 s	+103.3%
RAM	43062128 kB	35912204 kB	-19.91%

FRI domain length: $2^{22}$. Around 42% of the time of the JIT variant is spent in step open_trace_leafs.

Algorithmic Details

In all cases, we assume all column interpolants (randomized, and over the randomized trace domain) are available.

Opening Trace Rows

FRI terminates by supplying indices of rows in the low-degree-extended table that the prover must open. The prover computes these rows by parallelizing over all interpolants and applying multi-point evaluation. Note that we are currently using a multi-point evaluation algorithm that is known not to be optimal.

Committing to the Tables

The prover iterates over all columns in batches of NUM_THREADS, low-degree-extends them in parallel, and absorbs the yielded cells into a vector of sponge states.

Computing the Quotient Segment Codewords

The prover evaluates the (singular) quotient on NUM_SEGMENT-many cosets of the trace domain. This step produces a table which then undergoes some transformations:

a permutation of the elements
a NTT of length NUM_SEGMENTS on all rows
an element-wise scaling.

The result is the evaluations on one coset of the trace domain of NUM_SEGMENT-many quotient segment polynomials that match with the interleaving split of the high-degree singular quotient polynomial. These codewords are interpolated with INTT and then evaluated with NTT on the FRI domain for Merkle tree commitment.

Sword-Smith commented 2 months ago

Consider renaming the NO_CACHE environment variable to TVM_NO_CACHE to avoid name collisions.

Sword-Smith commented 2 months ago

The benchmark parameters are too small to get a clear view of what's going on, and the RAM consumption is inconsistent for the cached version as the $2^{11}$ FRI domain length version uses more RAM than the $2^{11}$ version.

Consider trying with an padded height of $2^{19}$.

jan-ferdinand commented 1 month ago

It is likely that parameters can be tweaked to improve speed or memory consumption of low-memory proving. Because there are currently no clear benchmarking targets in place, and because low-memory proving is not of the highest priority for the time being, such optimiziations are deferred.

TritonVM / triton-vm