0xPolygonMiden / miden-vm

STARK-based virtual machine
MIT License
617 stars 153 forks source link

Consider refactoring `mem_stream` semantics #821

Open frisitano opened 1 year ago

frisitano commented 1 year ago

The mem_stream operation works as follows:

"Loads two words from memory starting at the address a, overwrites the top 8 elements of the stack with them, and applies Rescue Prime Optimized permutation to the top 12 elements of the stack. At the end of the operation the address is incremented by 2."

This is an efficient means of hashing data that is stored in memory. However one of the limitations of mem_stream is that it requires data to be laid out sequentially. This is due to the fact that the memory address is incremented by 2. In some cases it would be preferable if the data is not laid out sequentially. An example of this is created / consumed notes. We want all data associated with a note to be contained in single memory block. The next notes data would start at the end of the current notes block. Lets say that a notes data block is 1024 memory slots. To allow the mem_stream operation to be effective in this situation we would like the mem_stream operation to take an immediate value which indicates the address of data required for the next iteration. E.g. mem_stream.1024 suggests that we should increment it by 1024 instead of the default of 2.

hackaugusto commented 1 year ago

I may have misunderstood the issue. But I think what you want is a procedure that should be built on top of mem_stream that hashes 1024 words. #805 may be of interest.

frisitano commented 1 year ago

I may have misunderstood the issue. But I think what you want is a procedure that should be built on top of mem_stream that hashes 1024 words. #805 may be of interest.

It's related but a slightly different requirement. Say the data addresses I want to hash are laid out such that the addresses of the target memory slots are 0,1,10,11,20,21. In this instance I want mem_stream to hash 0,1 and then increment the address to 10 which is the address of the next target slot, I should be able to indicate this requirement by providing an immediate argument mem_stream.10. As it stands the address is incremented by 2 by default.

hackaugusto commented 1 year ago

Ah, I see, you have something like this:

 0         10        20
[xx--------xx--------xx--------]

where `x` represents useful data, and `-` represents padding

Above x would represent the actual data of something like a note, and you want to skip the padding data, right?

I think that should be a procedure (probably built on top of #805), and I suppose that would build a tree of sorts, instead of doing sequential hashing of everything, no?

frisitano commented 1 year ago

I think that should be a procedure (probably built on top of #805), and I suppose that would build a tree of sorts, instead of doing sequential hashing of everything, no?

Yes your understanding of the requirements is correct. Why do you think it should be a procure instead of changing the the operation to take an immediate argument that reflects the memory jump? If we modify the instruction itself we can do it in a single cycle.

frisitano commented 1 year ago

On second thoughts an immediate value may not be the best way to do it as the value would then need to be pushed onto the stack at the start of every iteration? Instead we could have the increment amount at depth 13 in the stack - bellow the current address.

bobbinth commented 1 year ago

Instead we could have the increment amount at depth 13 in the stack - bellow the current address.

Yes, I think this would be the way to do it. The tradeoff would be that it would slightly complicate usage when we need to increase the address by 2 - but maybe that's not a big deal.

Another potential thing to refactor about mem_stream is that it is currently performing two operations: MSTREAM HPERM (see here). I wonder if it would make sense to split it up so that HPERM would need to be called explicitly. One reason for doing this is that sometimes we load values onto the stack and we may want to use them immediately but calling HPERM absorbs them into the hasher state.

frisitano-polygon commented 1 year ago

Another potential thing to refactor about mem_stream is that it is currently performing two operations: MSTREAM HPERM (see here). I wonder if it would make sense to split it up so that HPERM would need to be called explicitly. One reason for doing this is that sometimes we load values onto the stack and we may want to use them immediately but calling HPERM absorbs them into the hasher state.

Yes I think splitting it up makes sense. It gives the user more flexibility and control. I wonder if we want to do the same for adv_pipe? There have been cases where we want to load data directly from the advice provider onto the stack.

bobbinth commented 1 year ago

Yep, I think if we do refactor, we should refactor both to keep things consistent.