Address bundling - Githubissues

This PR refactors how addresses are internally handled by the Kernel. Instead of dealing with three distinct words, each storing context, segment, and virtual, addresses are now represented using a single U256 word as $virt + 2^{32} seg + 2^{64} ctx$.

In addition, some memory offsets like GlobalMetadata, ContextMetadata, TxnFields, which set size is known in advance, are already scaled by their respective segments, to allow for faster address construction (as we now 2 ADD in addition of the existing 3 pushes when inserting an arbitrary address on the stack).

context and segment values are automatically scaled by their respective shift to reduce the overhead on the kernel side.

As a follow-up, this refactoring enables almost immediate drop of an entire memory channel, effectively removing 13 CPU columns.

For ERC20 against latest main, the trace lengths are: TraceCheckpoint { arithmetic_len: 18374, byte_packing_len: 26159, cpu_len: 102230, keccak_len: 12768, keccak_sponge_len: 532, logic_len: 3705, memory_len: 463214 } while on this branch, they are: TraceCheckpoint { arithmetic_len: 21766, byte_packing_len: 24570, cpu_len: 94271, keccak_len: 13920, keccak_sponge_len: 580, logic_len: 3945, memory_len: 447213 }

Note:

there are some inefficient operations in the kernel, following the major refactoring around stack operations, etc, which could be dealt with in the future, but opening the PR from now as it is quite massive already.
I've run 8000 tests in witness-only mode with the test-runner, without any failure. Now running in proving mode just to make sure everything works as expected (got ~1.5k tests ok so far).
I haven't updated the specs yet, will do after getting some feedback on the approach.

closes #1324

0xPolygonZero / plonky2

Address bundling #1426

Quality Gate passed