Open chancehudson opened 3 months ago
Thanks for the suggestion. It's not something I recall talking about but definitely worth considering. And please add these suggestions to the maybe-wishlist so that we don't lose track of it.
Just so I understand correctly -- this instruction would allow you to search-and-replace push n read_mem 1 pop 1
by read_memi n
, and analogously for write_memi
. Is that correct?
Adding extra instructions does add overhead to both proving and verifying and, cumulatively, to proving a verification program. The introduction of addi
gave us 2 new degree-lowering columns, which means 2 more NTTs, 2 more columns to be hashed, etc. Note that addi
shares a lot of logic with existing instructions; for more complex new instructions the cost might be higher than just two extra columns. Unfortunately, one has to implement it in order to find out what the exact cost is.
Since adding instructions is costly, the benefit of adding one needs to outweigh its cost. A more convenient programmer experience is a benefit for sure, but we are more interested in performance metrics: how many cycles to execute a particular task? We can take a 1% drop in prover speed if we get a 2% faster recursive verification. In light of this question, could you estimate how many cycles the proposed instructions would save (relative and absolute)? Would you be using them inside the innermost body of a nesting of loops, or more likely in their setup?
A drawback of this suggestion is that the factor 3-speedup that it yields only works with single-word data types, like bool, u32, and b-field elements. For bigger data types like hash-digests, there are no savings as you can currently do
push {address+4}
read_mem 5
pop 1
and using read_mem i
would yield
read_memi {address+4}
read_memi {address+3}
read_memi {address+2}
read_memi {address+1}
read_memi {address}
Where address
is the location of the 1st word of the digest.
This is not to say that the instruction isn't worth it. Just that it's only beneficial for small data types (and pointers, of course).
I'm marking this as a “good first issue,” which it only really is with a pattern to match on. One such pattern could be the commit introducing instruction addi
that was mentioned in the OP: 3b5bc128c086489301803e0f5b8881d00f681e97
Just so I understand correctly -- this instruction would allow you to search-and-replace
push n read_mem 1 pop 1
byread_memi n
, and analogously forwrite_memi
. Is that correct?
Yes
could you estimate how many cycles the proposed instructions would save
This is tough. I'm designing a scripting language to try and make it easier to write code that executes in the vm. Crucially this code needs to be able to manipulate vectors and matrices, and pass references to such structures. To achieve this i'm passing memory references around. Anytime there's logic manipulating a specific piece of such a memory reference the read_memi
/write_memi
instructions could provide savings.
I haven't looked into optimizing the code very much. There are certainly cases where contiguous segments are accessed where the values can be buffered on the stack, but there are cases where it's not possible as well. I'll continue to experiment and try to get some empirical data.
This is not to say that the instruction isn't worth it. Just that it's only beneficial for small data types.
Yes it's certainly a niche instruction. I think it's beneficial not just for small data types though, but also cases where additional instruction are needed to arrange the results in sequential order for writing. e.g. cases where swap
, dup
, etc logic, (stack management) can be replaced with the immediate write/read.
I saw that the
addi
instruction was added in a recent commit. Is there any interest in addingread_memi
andwrite_memi
instructions? My rationale is, I pretty frequently need to read/write single values to non-sequential, constant, memory indices.read_memi + n
: push value at memory indexn
onto the stackwrite_memi + n
: pop the top value of the stack and store it at memory indexn
n
is the memory index. The value being read/written is on the stack. This is beneficial in cases of static memory layouts.Have you considered such an opcode when writing
tasm-lib
? Would adding such an instruction add significant overhead to the proving complexity?Thanks