TritonVM / triton-vm

Triton is a virtual machine that comes with Algebraic Execution Tables (AET) and Arithmetic Intermediate Representations (AIR) for use in combination with a STARK proof system.
https://triton-vm.org
Apache License 2.0
247 stars 37 forks source link

Immediate memory opcodes #318

Open chancehudson opened 3 months ago

chancehudson commented 3 months ago

I saw that the addi instruction was added in a recent commit. Is there any interest in adding read_memi and write_memi instructions? My rationale is, I pretty frequently need to read/write single values to non-sequential, constant, memory indices.

read_memi + n: push value at memory index n onto the stack write_memi + n: pop the top value of the stack and store it at memory index n

n is the memory index. The value being read/written is on the stack. This is beneficial in cases of static memory layouts.

Have you considered such an opcode when writing tasm-lib? Would adding such an instruction add significant overhead to the proving complexity?

Thanks

aszepieniec commented 3 months ago

Thanks for the suggestion. It's not something I recall talking about but definitely worth considering. And please add these suggestions to the maybe-wishlist so that we don't lose track of it.

Just so I understand correctly -- this instruction would allow you to search-and-replace push n read_mem 1 pop 1 by read_memi n, and analogously for write_memi. Is that correct?

Adding extra instructions does add overhead to both proving and verifying and, cumulatively, to proving a verification program. The introduction of addi gave us 2 new degree-lowering columns, which means 2 more NTTs, 2 more columns to be hashed, etc. Note that addi shares a lot of logic with existing instructions; for more complex new instructions the cost might be higher than just two extra columns. Unfortunately, one has to implement it in order to find out what the exact cost is.

Since adding instructions is costly, the benefit of adding one needs to outweigh its cost. A more convenient programmer experience is a benefit for sure, but we are more interested in performance metrics: how many cycles to execute a particular task? We can take a 1% drop in prover speed if we get a 2% faster recursive verification. In light of this question, could you estimate how many cycles the proposed instructions would save (relative and absolute)? Would you be using them inside the innermost body of a nesting of loops, or more likely in their setup?

Sword-Smith commented 3 months ago

A drawback of this suggestion is that the factor 3-speedup that it yields only works with single-word data types, like bool, u32, and b-field elements. For bigger data types like hash-digests, there are no savings as you can currently do

push {address+4}
read_mem 5
pop 1

and using read_mem i would yield

read_memi {address+4}
read_memi {address+3}
read_memi {address+2}
read_memi {address+1}
read_memi {address}

Where address is the location of the 1st word of the digest.

This is not to say that the instruction isn't worth it. Just that it's only beneficial for small data types (and pointers, of course).

jan-ferdinand commented 3 months ago

I'm marking this as a “good first issue,” which it only really is with a pattern to match on. One such pattern could be the commit introducing instruction addi that was mentioned in the OP: 3b5bc128c086489301803e0f5b8881d00f681e97

chancehudson commented 3 months ago

Just so I understand correctly -- this instruction would allow you to search-and-replace push n read_mem 1 pop 1 by read_memi n, and analogously for write_memi. Is that correct?

Yes

could you estimate how many cycles the proposed instructions would save

This is tough. I'm designing a scripting language to try and make it easier to write code that executes in the vm. Crucially this code needs to be able to manipulate vectors and matrices, and pass references to such structures. To achieve this i'm passing memory references around. Anytime there's logic manipulating a specific piece of such a memory reference the read_memi/write_memi instructions could provide savings.

I haven't looked into optimizing the code very much. There are certainly cases where contiguous segments are accessed where the values can be buffered on the stack, but there are cases where it's not possible as well. I'll continue to experiment and try to get some empirical data.

This is not to say that the instruction isn't worth it. Just that it's only beneficial for small data types.

Yes it's certainly a niche instruction. I think it's beneficial not just for small data types though, but also cases where additional instruction are needed to arrange the results in sequential order for writing. e.g. cases where swap, dup, etc logic, (stack management) can be replaced with the immediate write/read.