Open aszepieniec opened 3 months ago
It turns out the the opstack table height is the bottle-neck in the verifier, not the clock cycle count (which is recorded by the processor table). So counting number of clock cycles that a new instruction would shave off might not be the relevant metric.
I was considering a recurse
instruction that acted as the current recurse
if the top to stack elements are different and acts as return
if the two top stack elements are the same. That would probably shave off a good amount of both opstack table rows since the start of almost all our loops looks like this:
loop_label:
dup <m>
dup <n>
eq
skiz return
<loop body>
recurse
This new recurse
instruction would save 4 opstack table rows and 4 clock cycles for each loop iteration.
I guess my view has changed a bit.
The recurse_or_return I think would work best is one that returns if ST[5] == 1. Otherwise it recurses. That could be used in combination with merkle_step to reduce the loop body to 2 instructions.
A loop that walks up a Merkle tree would then be:
auth_path_loop:
merkle_step
recurse_or_return
I understand that this creates a problem for Merkle trees of height 0, with only one node. But I think that's not a practical problem.
edit: but let's see where we stand after having added the dot_steps and the merkle_step.
This is a tracking issue. We add imagined instructions that could make recursion (or consensus programs) faster.
read_mem_forward
❌*ptr
(*ptr+n) [element]
HashVarlen
becomes 3.hash_var_len
is only performance critical when the number of to-be-hashed elements is known a-priori, making other approaches feasibledot_step
(see below) eliminates the second next important use forread_mem_forward
dot_step
✅_ acc2 acc1 acc0 *lhs *rhs
_ acc2' acc1' acc0' (*lhs+3) (*rhs+3)
InnerProductOfThreeRowsWithWeights
becomes 1. Stands to reduce `1M to ~65000 cycles.272
merkle_step
✅_ merkle_node_idx [Digest; 5]
_ (merkle_node_idx // 2) [Digest'; 5]
divine_sibling
andhash
into one instruction.divine_sibling
. Instructionhash
remains available as-is.divine_sibling
andhash
, which change the height of the stack by 5 elements each.get_colinear_y
_ ax [ay] bx [by] [cx]
(possibly different order)_ [cy]
compute_c_values
from 74 instructions to 49. Total cycle count reduction: ~26000.recurse_or_return
✅_ a b
_ a b
recurse
if $a \neq b$. Else, acts likereturn
.dup <m> dup <n> eq skiz return <loop_body> recurse
, reducing the op stack delta of loop maintenanance.288
absorb_from_mem
✅_ mem_ptr [garbage; 3]
_ (mem_ptr - 10) [garbage; 3]
read_mem 5 read_mem 5 sponge_absorb
, albeit not a drop-in replacement due to the[garbage; 3]
, which is needed for arithmetization reasons.