This PR refactors how addresses are internally handled by the Kernel.
Instead of dealing with three distinct words, each storing context, segment, and virtual, addresses are now represented using a single U256 word as $virt + 2^{32} seg + 2^{64} ctx$.
In addition, some memory offsets like GlobalMetadata, ContextMetadata, TxnFields, which set size is known in advance, are already scaled by their respective segments, to allow for faster address construction (as we now 2 ADD in addition of the existing 3 pushes when inserting an arbitrary address on the stack).
context and segment values are automatically scaled by their respective shift to reduce the overhead on the kernel side.
As a follow-up, this refactoring enables almost immediate drop of an entire memory channel, effectively removing 13 CPU columns.
For ERC20 against latest main, the trace lengths are:
TraceCheckpoint { arithmetic_len: 18374, byte_packing_len: 26159, cpu_len: 102230, keccak_len: 12768, keccak_sponge_len: 532, logic_len: 3705, memory_len: 463214 }
while on this branch, they are:
TraceCheckpoint { arithmetic_len: 21766, byte_packing_len: 24570, cpu_len: 94271, keccak_len: 13920, keccak_sponge_len: 580, logic_len: 3945, memory_len: 447213 }
Note:
there are some inefficient operations in the kernel, following the major refactoring around stack operations, etc, which could be dealt with in the future, but opening the PR from now as it is quite massive already.
I've run 8000 tests in witness-only mode with the test-runner, without any failure. Now running in proving mode just to make sure everything works as expected (got ~1.5k tests ok so far).
I haven't updated the specs yet, will do after getting some feedback on the approach.
This PR refactors how addresses are internally handled by the Kernel. Instead of dealing with three distinct words, each storing
context
,segment
, andvirtual
, addresses are now represented using a singleU256
word as $virt + 2^{32} seg + 2^{64} ctx$.In addition, some memory offsets like
GlobalMetadata
,ContextMetadata
,TxnFields
, which set size is known in advance, are already scaled by their respective segments, to allow for faster address construction (as we now2 ADD
in addition of the existing 3 pushes when inserting an arbitrary address on the stack).context
andsegment
values are automatically scaled by their respective shift to reduce the overhead on the kernel side.As a follow-up, this refactoring enables almost immediate drop of an entire memory channel, effectively removing 13 CPU columns.
For ERC20 against latest main, the trace lengths are:
TraceCheckpoint { arithmetic_len: 18374, byte_packing_len: 26159, cpu_len: 102230, keccak_len: 12768, keccak_sponge_len: 532, logic_len: 3705, memory_len: 463214 }
while on this branch, they are:TraceCheckpoint { arithmetic_len: 21766, byte_packing_len: 24570, cpu_len: 94271, keccak_len: 13920, keccak_sponge_len: 580, logic_len: 3945, memory_len: 447213 }
Note:
witness-only
mode with the test-runner, without any failure. Now running inproving
mode just to make sure everything works as expected (got ~1.5k tests ok so far).closes #1324