TritonVM / triton-vm

Triton is a virtual machine that comes with Algebraic Execution Tables (AET) and Arithmetic Intermediate Representations (AIR) for use in combination with a STARK proof system.
https://triton-vm.org
Apache License 2.0
223 stars 35 forks source link

Just-In-Time Low-Degree-Extension #264

Closed aszepieniec closed 1 month ago

aszepieniec commented 2 months ago

Storing the entire low-degree-extended trace takes up a lot of memory. Just-in-time low-degree-extension is an alternative to this cached approach that computes (and in some cases recomputes) low-degree-extended columns or subtables when they are needed and drops them afterwards.

If enough RAM is available and the environment variable NO_CACHE is not set, the prover will use the faster cached workflow. If not enough RAM is available or the environment variable NO_CACHE is set, then the prover will compute the low-degree extension just-in-time and avoid storing excessive data.

Benchmarks

I ran some benchmarks on my machine for the two prover benchmark programs, prove_halt and prove_fib. There is quite some error on these benchmarks.

prove_halt

cache jit quotient
time 434.76 ms 513.72 ms +18.16%
RAM 593080 kB 51780 kB -91%

FRI domain length: $2^{11}$. Around 14% of the time of the JIT variant is spent in step open_trace_leafs.

prove_fib (input 100)

cache jit quotient
time 280.28 ms 520.92 ms +85.86%
RAM 120312 kB 82224 kB -31.66%

FRI domain length: $2^{13}$. Around 44% of the time of the JIT variant is spent in step open_trace_leafs.

prove_fib (input 40_000) (on machine mjolnir)

cache jit quotient
time 46.06 s 93.63 s +103.3%
RAM 43062128 kB 35912204 kB -19.91%

FRI domain length: $2^{22}$. Around 42% of the time of the JIT variant is spent in step open_trace_leafs.

Algorithmic Details

In all cases, we assume all column interpolants (randomized, and over the randomized trace domain) are available.

Opening Trace Rows

FRI terminates by supplying indices of rows in the low-degree-extended table that the prover must open. The prover computes these rows by parallelizing over all interpolants and applying multi-point evaluation. Note that we are currently using a multi-point evaluation algorithm that is known not to be optimal.

Committing to the Tables

The prover iterates over all columns in batches of NUM_THREADS, low-degree-extends them in parallel, and absorbs the yielded cells into a vector of sponge states.

Computing the Quotient Segment Codewords

The prover evaluates the (singular) quotient on NUM_SEGMENT-many cosets of the trace domain. This step produces a table which then undergoes some transformations:

The result is the evaluations on one coset of the trace domain of NUM_SEGMENT-many quotient segment polynomials that match with the interleaving split of the high-degree singular quotient polynomial. These codewords are interpolated with INTT and then evaluated with NTT on the FRI domain for Merkle tree commitment.

Sword-Smith commented 2 months ago

Consider renaming the NO_CACHE environment variable to TVM_NO_CACHE to avoid name collisions.

Sword-Smith commented 2 months ago

The benchmark parameters are too small to get a clear view of what's going on, and the RAM consumption is inconsistent for the cached version as the $2^{11}$ FRI domain length version uses more RAM than the $2^{11}$ version.

Consider trying with an padded height of $2^{19}$.

jan-ferdinand commented 1 month ago

It is likely that parameters can be tweaked to improve speed or memory consumption of low-memory proving. Because there are currently no clear benchmarking targets in place, and because low-memory proving is not of the highest priority for the time being, such optimiziations are deferred.