Open bobbinth opened 3 weeks ago
...
This change is beneficial to #1544 since I was thinking of a way to convey the notion that MastForest
(code) requires the rodata
loaded into the advice provider before it can be executed.
The MASM-facing part (syntax, parsing, etc.) of the implementation would take me quite a lot of time since I'm not familiar with the code, but the VM-facing part I believe I can do in a reasonable amount of time. If @bitwalker is ok with it, I can take a stab at it.
Upon assembly, this data would be added to the
MastForest
. For this, we'd need to add a singleAdviceMap
property to theMastForest
struct - e.g., something like this:pub struct MastForest { /// All of the nodes local to the trees comprising the MAST forest. nodes: Vec<MastNode>, /// Roots of procedures defined within this MAST forest. roots: Vec<MastNodeId>, /// All the decorators included in the MAST forest. decorators: Vec<Decorator>, /// Advice map to be loaded into the VM prior to executing procedures from this MAST forest. advice_map: AdviceMap, }
Then, when the VM starts executing a given MAST forest, it'll copy the contents of the advice map into its advice provider (we can also use a slightly more sophisticated strategy to make sure that the map is copied only once).
I've taken a look and here are my findings on what needs to be done to implement this:
AdviceMap
type from processor
to core
;AdviceMap
when merging MAST forests (join with other AdviceMap
s?);MastForest
should handle the AdviceMap
as well, but it'll break storing the rodata
separately in the Package
(roundtrip serialization would not work). We could put rodata
in AdviceMap
on the compiler side as well and not store it separately in the Package
. @bitwalker is it ok?Open questions
While the above approach should work, there are a few things we need to clarify before implementing it:
How should we handle conflicting keys during assembly and execution?
- If we encounter two entries with the same key but different data during assembly, this should probably be an error.
Yes, I think it should be an error. From rodata
perspective, the digest is a hash of the data itself, so if the data is different, the digest will be different as well. From the MASM perspective, this might mean key/digest re-use, which does not seem like something a user might want, so failing early is a good thing to do.
- But what to do if we start executing a MAST forest which wants to load data into the advice provider but an entry with the same key but different data is already in the advice map? Should we error out? Silently replace the existing data with the new one? Anything else?
If the user code treats the advice provider as a some sort of dictionary, that's a valid use case. I'm not sure if it should be an error.
Handle the
AdviceMap
when merging MAST forests (join with otherAdviceMap
s?);
Yes, I think merging would work fine here. If there is a conflict (two entries with the same key by different data), we'd error out here as well.
Serialization/deserialization of the
MastForest
should handle theAdviceMap
as well, but it'll break storing therodata
separately in thePackage
(roundtrip serialization would not work). We could putrodata
inAdviceMap
on the compiler side as well and not store it separately in thePackage
. @bitwalker is it ok?
Yeah - I think once we have this support for advice map entries in MastForest
, there is no need to store rodata separately in the package.
I'll wrap-up my current work (Component
support in the compiler pipeline) and jump on the VM-faced part.
In the above example FOO refers to a full word. All our constants currently refer to single elements. Ideally, we should be able to tell by looking at the constant name whether it is for a full word or a single element. So, maybe we should come up with some simple scheme here to differentiate them?
The parser already knows how to parse various sizes of constants, including single words, or even arbitrarily large data (the size of the data itself indicates which type it is).
Should the key handle FOO be accessible outside of the module it was defined in? It seems like it would be a good idea, but then we need to be able to apply some kind of visibility modifiers to advent.
These would be effectively globally visible symbols, and while unlikely, you can have conflicting keys, so I think any attempt to make it seem like these can be scoped should be avoided.
How should we handle conflicting keys during assembly and execution?
I'm not sure how we handle this during execution today actually, presumably we just clobber the data if two things are loaded with the same key into the advice map?
During assembly I think it has to be an error. It might be possible to skip the error if the data is the same, I think it's still an open question whether or not you would want to know about the conflicting key regardless.
I'm questioning a bit whether it makes sense to define this stuff in Miden Assembly;
KEY
is expected to be in the advice map at runtime, and they can refer to it in MASM as you've described (e.g. push.KEY
) - the actual data (i.e. the value being referenced) could be supplied any number of ways, and in fact need not even be provided at assembly-time.KEY
in their handwritten MASM, this gives them that ability without requiring that we figure out how to also encode the value in the syntax, while freeing us up to gather those values any number of ways.Setting that aside for a moment:
advent
, it took me a minute to understand, even knowing what it was supposed to be in theory, I would go with something along the lines of advice_init
or adv_map.init
.advice_init.<key>=<value>
where <key>
is a word, and <value>
is N hex-encoded bytes in big-endian order which will be interpreted as raw field elements.advice.expect.FOO
or something to that effect, which would indicate to the assembler that FOO
is to refer to an item whose value is to be provided as an input to the assembler, from which a key will be derived (or optionally user-provided), and the key/value pair inserted into the advice map. This would allow the tool to take user input of the form JSON={"foo": 1}
and handle not only encoding that into words, but compute the key for it, and then allow the user to write code referencing a value that is then maintained in another authoritative source that is then provided as an additional input to the assembler, possibly via some kind of extensible config provider mechanism.I've taken a look and here are my findings on what needs to be done to implement this:
- Move the AdviceMap type from processor to core;
- Handle the AdviceMap when merging MAST forests (join with other AdviceMaps?);
We'll need to catch conflicting keys (different values for the same key, but fine if the keys overlap with the same value), but a straight merge of the two maps should be fine otherwise.
Serialization/deserialization of the MastForest should handle the AdviceMap as well, but it'll break storing the rodata separately in the Package (roundtrip serialization would not work). We could put rodata in AdviceMap on the compiler side as well and not store it separately in the Package. @bitwalker is it ok?
Once we can write our rodata to the MastForest
directly, we won't need to do it in the Package
anymore, so that sounds fine to me!
@greenhat For now, I would focus purely on the implementation around the MastForest
/processor (what you've suggested AIUI), don't worry about the AST at all. That's all we need for the compiler anyway, while we figure out how to handle the frontend aspect in the meantime.
In some situations it maybe desirable to specify some data which a given program assumes to be available in the advice provider. One example of this is read-only data output by the compiler, but there could be may other examples. Currently, such data needs to be loaded separately into the VM which introduces extra complexities.
One way around this is to allow users to define data which is to be loaded into the advice provider before a given program starts executing. The syntax for this in MASM could look like so:
Here,
advent
specifies that we want to add an entry to the advice map. The key for the entry would be the word defined by0x9dfb1fc9f2d5625a...
value. The data of the entry would be the list of field elements defined by the hex encoded string. We also provide a way to specify a labelFOO
by which they key can be referred to from the code. For example:Would push the key
0x9dfb1fc9f2d5625a...
onto the stack.Upon assembly, this data would be added to the
MastForest
. For this, we'd need to add a singleAdviceMap
property to theMastForest
struct - e.g., something like this:Then, when the VM starts executing a given MAST forest, it'll copy the contents of the advice map into its advice provider (we can also use a slightly more sophisticated strategy to make sure that the map is copied only once).
Open questions
While the above approach should work, there are a few things we need to clarify before implementing it:
FOO
refers to a full word. All our constants currently refer to single elements. Ideally, we should be able to tell by looking at the constant name whether it is for a full word or a single element. So, maybe we should come up with some simple scheme here to differentiate them?FOO
be accessible outside of the module it was defined in? It seems like it would be a good idea, but then we need to be able to apply some kind of visibility modifiers toadvent
.