a16z / jolt

The simplest and most extensible zkVM. Fast and fully open source from a16z crypto and friends. ⚡
https://jolt.a16zcrypto.com
MIT License
655 stars 137 forks source link

Parallelize Memory Trace Processing #338

Closed tahsintunan closed 5 months ago

tahsintunan commented 5 months ago

Issue

https://github.com/a16z/jolt/issues/292

Approach

Parallelized the processing of registers and RAM, improving the performance of the memory_trace_processing segment (~30% improvement on M2 Pro). However, the overall performance is still largely bottlenecked by the RAM task, which takes longer to complete compared to the register task.

Ideally, the next step would be to further subdivide the RAM task into smaller subtasks based on address ranges to distribute the workload across multiple cores. However, testing with fib_e2e and sha3_e2e revealed an uneven distribution of RAM accesses across the address space. That means, if we break down the >=32 address space into N fixed-sized blocks, for example, and assign one CPU per block, then practically only 2-3 cores will be working while the rest will sit idle.