Closed bobbinth closed 7 months ago
Thanks for the write-up! One question:
to properly process them in the transaction kernel, we need to know some basic info about what kind of a note it is
Can you elaborate on why the transaction kernel needs to know where the note data is being stored? The executing account/user would always need to provide the note data to the transaction kernel, right? Or do you plan to change the transaction kernel in the future?
I love the tags idea, especially to encode the asset into the tag. In the future, I expect users to create their own note scripts. So maybe, we can note down the tag schema in the docs and provide a small script in the client or wallet to create a tag?
Can you elaborate on why the transaction kernel needs to know where the note data is being stored? The executing account/user would always need to provide the note data to the transaction kernel, right? Or do you plan to change the transaction kernel in the future?
There are two primary reasons for this:
if-else
statement in the prologue: if a note is an encrypted note, then encrypt and hash its contents; otherwise, just hash the contents (as it works now).I love the tags idea, especially to encode the asset into the tag. In the future, I expect users to create their own note scripts. So maybe, we can note down the tag schema in the docs and provide a small script in the client or wallet to create a tag?
Yes, this should go into the docs - but we should settle on this first. One other option is to reduce tags to 16 bits rather than 32 bits. But there are pros and cons for each.
I wonder if we need the tag
system. For a public note querying over the public inputs and code hash should be enough. For off-chain notes its hash would serve the function of the tag. For on-chain notes, and the same for encrypted notes.
For a public note querying over the public inputs and code hash should be enough.
This could work, but as the number of different note types grows, so will the number of potential combinations of script hash + public inputs (e.g., filter for notes where script_root = x and (inputs[0] = y or inputs[3] = z)
). This may require us to support a mini-query language to specify relevant conditions - which, in my opinion, is probably too complex. With tags, we tags, we offload this tasks to the users.
For off-chain notes its hash would serve the function of the tag.
This could also work, but would probably end up being just as complex as we'd need to support some kind of "fuzzy matching" on note hashes to provide some degree of privacy.
the same for encrypted notes
I'm actually not sure how this will work for encrypted notes. For these notes, the recipient does not control the note hash and note script/inputs are encrypted - so, I'm not sure what can be a meaningful filter here.
Overall, I think tags are a pretty simple solution which can be applied uniformly to all cases we need to support. Getting rid of them will probably result in more complexity.
@bobbinth which component will determine the note type? For example, the encrypted note, will it be encrypted inside the VM? Or will it be encrypted by the host?
To give some context: I'm asking because I need to know where the type is set, after the transaction is executed and when the executed transaction is being created. Or when parsing the transaction output. One approach will also need changes to the kernel, whereas the other is rust only.
I think inside the VM makes the most sense. The reasons for this are:
To support the above, we'll probably need to modify create_note
kernel procedure. Or maybe create several variants of this procedure - e.g., create_offchain_note
, create_network_note
etc.
What about network transactions? It seems odd that a user can send a tx script to the network which calls create_offchain_note
. That will add some code validation requirements, that maybe we can get away without if the note type is defined outside of the VM.
Do we need to verify the note type inside the VM? Shouldn't this be a detail decoupled from the kernel, and isolate exclusively as a transport related detail?
What about network transactions? It seems odd that a user can send a tx script to the network which calls
create_offchain_note
. That will add some code validation requirements, that maybe we can get away without if the note type is defined outside of the VM.
For network transactions, the user would not send a tx script to the network. Or at least the way I'm thinking of network transactions now is that the node would pick a set of notes to be applied to a given account and create a transaction from them - and this would be done without any input from the user.
I think you are right though that when a network transaction is executed, we need to handle special cases. Specifically, if such a transaction results in an on-chain note, the node needs to know all the note details.
So, maybe for now we should limit the scope of this to only offchain
and local
notes - and tackle network
notes separately.
Do we need to verify the note type inside the VM? Shouldn't this be a detail decoupled from the kernel, and isolate exclusively as a transport related detail?
I don't think we can decouple these. The main reason is that the transaction needs to commit to the note type - otherwise, a malicious node could change the type of the note (e.g., pretend that a note is offchain while it is supposed to be onchain).
I don't think we can decouple these. The main reason is that the transaction needs to commit to the note type - otherwise, a malicious node could change the type of the note (e.g., pretend that a note is offchain while it is supposed to be onchain).
Is it useful to discern the two though? A malicous node can circumvent the check by consuming the note with a local transaction. I think we should rely exclusively on the note's script for security w.r.t. execution.
A malicous node can circumvent the check by consuming the note with a local transaction
hmmm - I'm not sure how this would work.
In general, the issue this is aiming to prevent is during transaction relay between nodes. For example, if node A sends a transaction to node B, node B needs to have a way to determine the type of every of the note produces by the transaction. If node A is free to manipulate the note type, they could say that all notes are off-chain while some of them may be on-chain. And this would lead node B to accept a transaction with malicious data.
The only way I can think of to avoid this is to commit to note type as a part of transaction output - i.e., by putting note type into note metadata (which I think is also the right place for it anyway).
Note: I'm using network notes for the fungible asset distribute and the basic wallets' send_asset
Note: I'm using network notes for the fungible asset distribute and the basic wallets' send_asset
I think these would be local
notes (i.e., public notes, but intended for local consumption). Though, this makes me think if we overcomplicated note types a bit. Maybe we should just have 3 types: off-chain, encrypted, and public, and whether a note is intended for network or local execution would be determined solely by its tag.
Here is some additional context, based on a conversation I had with @bobbinth :
The problem space is big, but it can be split as follows:
First we have to clarify what data. Here we have:
The data can be store:
This is important for:
This is important for every actor in the system. Every actor needs a mechanism to tell if they can consume a note, i.e. they are the target. And when they can consume the note, i.e. should this note be discarded now, or held until a condition is met which allows it to be consumed?
A few more thoughts on this:
To be able to consume a note in a transaction we need to know a few things:
NoteId
+ metadata.
b. For on-chain output notes (whether public or encrypted), this also includes actual note details.Point 2a is probably the most tricky to handle. Basically, if I consume a note in a transaction, and this results in creation of an on-chain note, I need to know all the details of the note to be created and then the question is where do I get this data.
One option to address this is to include all relevant details of the output note into the input note's script. This reduces to: if I have the input note script, I know for sure I can generate the output note with all of its public details.
These details include:
So, maybe create_public_note
procedure should be quite a bit different from what we have in #515. For example, it could be something like:
#! Inputs: [ASSET, tag, input_ptr, input_len, script_ptr, script_len, SERIAL_NUM]
#! Outputs: [ptr, 0, 0, 0, 0, 0, 0, 0, 0]
export.create_public_note
Here, we would assume that the caller has laid out the actual inputs and script in memory. For inputs, computing the hash would be pretty straight-forward (since all it is a sequential hash), but not script, I'm not sure what to do yet as computing MAST root would probably be prohibitively expensive. We could, of course, provide the MAST root as an input as well, but then there would be no way to verify that provided script matches the provided MAST root (at least not inside the VM).
Expanding a bit on the pervious comment:
I think we can assume that an input note (or transaction script) with results in a creation of an on-chain note should be responsible for producing all data related to this note. There are two questions related to this:
For the first question, there could be several approaches:
I think the event-based approach is probably the more flexible one. It could work as follows: when a note is created (via create_public_note
procedure), the VM emits an event about the note being created. By that time, there should be enough data in the VM (whether on the stack, the memory, or the advice provider) for the host to be able to construct the Note
object for the created note. If there is not enough of such data, the VM panics. We'll need to define the exact way in which the host will look for the note data - but I think that shouldn't be too difficult.
For the second question, everything except for the note script should be pretty straight forward. Note ID unambiguously commits to note inputs, note assets, and serial number. However, the commitment to note script is somewhat ambiguous. Meaning, knowing the script root does not tell us exactly how much of the script should be included with the note. So, in addition to the script root, we need to include some other data with the note which would unambiguously define the note script.
The most straight-forward way to do it is to include a sequential hash of the script. So, in effect, every public note would have 2 commitments to the script: script root and script hash. While this works, it is quite inelegant and probably will cause various complications. But so far, I wasn't able to find a better solution.
However, the commitment to note script is somewhat ambiguous. Meaning, knowing the script root does not tell us exactly how much of the script should be included with the note.
Can you elaborate on that?
Let me also rephrase what I understood:
Say we have note script root $r$. We can easily check (outside of the VM) if a given note script $s$ results in the same root. However, $s$ is not unique because we can "hide" portions of a script. So, there could be another note script $t$ which also results in root $r$. Now, both $s$ and $t$ encode exactly the same program - so, that's not an issue from verification standpoint. But how much of the program is available, could be different between $s$ and $t$. This is not an issue for off-chain notes, but for on-chain notes we need to know the exact script the original user wanted to make available.
Is it a solution that in this case we don't allow hiding parts of the script and we always provide the full script? We can always enable this feature (hiding partial note scripts) if required.
Is it a solution that in this case we don't allow hiding parts of the script and we always provide the full script?
This won't quite work because putting the entire script on chain would be prohibitively expensive in many cases. For example, if I make a call to sha256
procedure, ideally, I won't have to put the actual code of this procedure on chain.
Ok, but if we don't force Alice to put the entire script on-chain, how can we ensure that Bob can execute it? At the point of note creation, Alice doesn't know which parts of the note script Bob will execute, right?
Is it a solution that we put all parts of the scripts on-chain that do not have a known root in miden-lib
. That would mean, we only need any custom part of the script. This could be done via a simple lookup table and it can easily be checked by the VM quite efficiently.
Is it a solution that we put all parts of the scripts on-chain that do not have a known root in
miden-lib
. That would mean, we only need any custom part of the script. This could be done via a simple lookup table and it can easily be checked by the VM quite efficiently.
Yes, this could be a solution - though, we'd need to include procedures from both miden-lib
and stdlib
in the set of "well-known" procedures. To make this work, we'd need two things:
One note: we'd need to keep track of procedures defined across all versions of these libraries (being able to mark some procedures as deprecated might be a good thing to have here cc @bitwalker).
The obvious downside of this overall approach as that users would not be able to hide any custom parts of scripts for public notes - but maybe that's something we can tackle later (I think when we'll need to figure this out by the time we get around to encrypted notes anyways).
If we go for this approach, what else is missing to be able to start?
We can easily check (outside of the VM) if a given note script s results in the same root. However, s is not unique because we can "hide" portions of a script. So, there could be another note script t which also results in root r. Now, both s and t encode exactly the same program - so, that's not an issue from verification standpoint.
Is this correct? My understanding is that just because you elide certain subtrees of the MAST, the hash doesn't change, because the proxy node for that part of the tree contains the hash of the callee, and proxy nodes are "transparent", i.e. the hash of the callee is used as the hash of that node, so it's no different than if the full tree was present. Maybe it is only the case that we can rely on this after my assembler refactoring that ensures a program is always compiled so that all callees are by MAST root, but in any case, I'd very much like to understand why hiding a portion of a program would change its hash, since that seems to break some of the nice properties of MAST.
One note: we'd need to keep track of procedures defined across all versions of these libraries (being able to mark some procedures as deprecated might be a good thing to have here
We can easily add this in the assembler (we already have facilities for this in the compiler), the only thing we'd probably need to discuss here is some way of representing structured attributes/metadata in Miden Assembly syntax. I think we probably want to avoid doing something like parsing doc comments for certain keywords or something, and instead have concrete syntax for things like this, e.g. @deprecated
or #[deprecated(since = "..", "note about deprecation")]
.
We can easily check (outside of the VM) if a given note script s results in the same root. However, s is not unique because we can "hide" portions of a script. So, there could be another note script t which also results in root r. Now, both s and t encode exactly the same program - so, that's not an issue from verification standpoint.
Is this correct? My understanding is that just because you elide certain subtrees of the MAST, the hash doesn't change, because the proxy node for that part of the tree contains the hash of the callee, and proxy nodes are "transparent", i.e. the hash of the callee is used as the hash of that node, so it's no different than if the full tree was present. Maybe it is only the case that we can rely on this after my assembler refactoring that ensures a program is always compiled so that all callees are by MAST root, but in any case, I'd very much like to understand why hiding a portion of a program would change its hash, since that seems to break some of the nice properties of MAST.
You are correct, hiding a portion of the program does not change the MAST root. The issue here is different. If we only know MAST root, we don't know how much of a given program is meant to be public. So, if I, as a user, send a public note and include some public program $s$ with it, a malicious operator would be able to transform this program $s$ into program $p$ by replacing some of the subtrees with proxy nodes. This would not change the MAST root, but it would alter the script that I wanted to put on chain in a way that would not be detectable by anyone but me.
We can easily add this in the assembler (we already have facilities for this in the compiler), the only thing we'd probably need to discuss here is some way of representing structured attributes/metadata in Miden Assembly syntax. I think we probably want to avoid doing something like parsing doc comments for certain keywords or something, and instead have concrete syntax for things like this, e.g.
@deprecated
or#[deprecated(since = "..", "note about deprecation")]
.
Agreed! I was also thinking that we should make provisions for this in the package format.
If we go for this approach, what else is missing to be able to start?
There are 3 open (or at least somewhat open) questions here:
Note
objects for public notes (and maybe for some off-chain notes as well)?The second question is really about how do we populate output_note_details in the ProvenTransaction
struct. More specifically, this data should probably be also present in the ExecutedTransaction
struct (maybe as a separate field, or maybe as a part of TransactionOutputs
struct) and the place where it should be extracted from the VM is in the build_executed_transaction() function.
The main question here is how does this data get injected into the VM. As I mentioned in https://github.com/0xPolygonMiden/miden-base/issues/402#issuecomment-1999141087, we probably want do do this via events intercepted by the host. It would also be nice if we could do something generic where the process works both for public and off-chain notes (this would simplify things on the client side). So maybe we should do the following:
Note
object could be constructed.create_public_note
or create_offchain_note
procedures are executed, we'd fire some events which would indicate that notes are being created.Note
object for the created note. For public notes, if the Note
object cannot be built, the execution should fail. For off-chain notes, failure to build a Note
object should not affect execution.
a. In the VM, we could still keep track of only the commitments to things like script_root
, inputs_root
etc. The actual values for these could exits only in the advice provider.ExecutedTransaction
object.The 3rd question is about how to reconcile information carried by note type vs. note tag. I've described one approach in the original post (4 note types and note tag schemas). But it is not clear to me that this is the right way to go about it. In general, we want to use note metadata to answer the following questions:
sender
field.Some more concrete questions:
Maybe, instead of note type, we define two additional properties for a note:
And we encode this info using 2 bits as described in the original post (this way, invalid combinations of these would be impossible).
...if I, as a user, send a public note and include some public program $s$ with it, a malicious operator would be able to transform this program $s$ into program $p$ by replacing some of the subtrees with proxy nodes. This would not change the MAST root, but it would alter the script that I wanted to put on chain in a way that would not be detectable by anyone but me.
I think this is the part that is tripping me up. My understanding was that changing the tree in any way should change the MAST root. The only way to replace a subtree with a proxy node without changing the MAST root would be if the proxy refers to the exact hash of the subtree it replaced, which doesn't actually change anything. The caveat to that I suppose is if one is finding hash collisions, but for any given program that's probably impractical while also achieving the malicious goal, e.g. stealing funds.
I think maybe the critical detail here I'm missing is in the private/public part, but I would've thought that either way the MAST roots ensure the tree is tamper proof.
Agreed! I was also thinking that we should make provisions for this in the package format.
Good point! I'll try to address that in my draft proposal when I get a chance here
The only way to replace a subtree with a proxy node without changing the MAST root would be if the proxy refers to the exact hash of the subtree it replaced, which doesn't actually change anything.
This does not change the program - but it does change how much of the program is "available" (in this scenario, I send a note to the operator and rely on the operator to provide this note in its entirety to whoever wants to consume the note). For example, if I want all of the note script to be publicly available, specifying just MAST root is not sufficient because the operator could replace some subtree with an equivalent proxy node and effectively hide this subtree from everyone else. The issue here is not so much that the operator can do this (they can do this in any setting), but that without some extra info nobody would know that the script I published was supposed to have everything public.
IMO we should forbid hiding parts of the MAST for on-chain executable notes.
If we don't have all the code available, we will need to keep the notes in the pool, in case the revealed bit of code happens to make it executable in the future. Meaning the pool will have to keep notes for undetermined time for this feature to work properly. I guess in the future we can relax this constraint, such that bits of the MAST can be hidden behind a proxy, if it is not necessary for the note consumption, but I think we should wait for a use case before going over the effort.
Documenting a proposal I shared with bobbin: I think we should use the note's code to determine when it can be consumed, instead of adding metadata fields. The rationale is that metadata can express all conditions for the note's execution.
For example, something like a simple order book, where a note should be executed only if the user's order criteria are matched, we can't possibly define a fixed set of values that determine when a note can be consumed, that is to say, the set of conditions is open and we can't predict what it will look like. Example 1: a new asset is added to the rollup and the note is against it, here the set of possible assets is open. Example 2: A token sale is being done, the token has some internal price curve, and the note wants to execute if a given criteria matched, here the set of tests is open.
The motivation example above is a order note. Well, if the conditions are just price based, it could well be the case the note will never be executable. So the question becomes how long should the note be kept around.
The next issue, is how to determine if a note can be consumed. Would we need to have a pool of notes, and on every new block re-execute every note to check if they became consumable?
I think we can't fix this issue, the system is so open that a note code could be buggy, it will always hit an assert and the note will never be consumable. So the solution here is just to allow the issue to happen and document it, notes would be added to the pool, and kept for an undetermined operator defined time prior to being dropped if they are not consumable.
The alternative is to restrict what can be done with notes.
I think the simplest solution here is just to have a priority queue of notes that failed a consume test, and the node would after a new block pop the highest priority to recheck, if the check fails the note is put back into the queue with a lower priority until its pool lifetime expires.
We could be a bit more clever, and track which trees the note is reading during its execution, and only retry the checks if given roots change. But this seems very, very, brittle, and I think it should be delayed until it really is needed
IMO we should forbid hiding parts of the MAST for on-chain executable notes.
This is the plan for now.
If we don't have all the code available, we will need to keep the notes in the pool, in case the revealed bit of code happens to make it executable in the future. Meaning the pool will have to keep notes for undetermined time for this feature to work properly. I guess in the future we can relax this constraint, such that bits of the MAST can be hidden behind a proxy, if it is not necessary for the note consumption, but I think we should wait for a use case before going over the effort.
I think it is important to distinguish here between:
In the first case, the network doesn't care about note scripts as it will not need to execute them (but the scripts still need to be on chain). But in both cases, allowing users to hide a part of a note script would be quite valuable (both for privacy and performance reasons).
Let's take P2IDR note as an example. A user may wish to create a public note with this script, but may want to hide the "recall condition". That is, in the "happy path" all data required to execute the note is on-chain. But after certain block height, the user could execute a hidden program to recall the note (the optimization benefit here is the recall part of the program does not need to be on-chain, thus saving some space).
@hackaugusto can you note down our common understanding in here?
on-chain
but the operator did not provide the data. Open questions:
a. How is the data published?
b. What are the user actions?Notes
from the VM, the procedure is described above.For example, if I want all of the note script to be publicly available, specifying just MAST root is not sufficient because the operator could replace some subtree with an equivalent proxy node and effectively hide this subtree from everyone else. The issue here is not so much that the operator can do this (they can do this in any setting), but that without some extra info nobody would know that the script I published was supposed to have everything public.
I see, I had the problem inverted - I was assuming the malicious operator was changing the actual semantics of the program, but what they are actually changing is the visibility of the program, in effect making it unexecutable by anyone else. Seems a bit limited in terms of what you could accomplish, but that's probably just due to my limited imagination 😅.
One thing I wanted to note I guess, since it caught my eye - one of the changes in my assembler PR that touches the processor, is that proxy nodes only fail to execute if the referenced node is not in the procedure cache, when it is available in the cache it is no different than if the actual subtree was present. This is a subtle change from before, which is that if the processor encountered a proxy node, it always treated that node as "unexecutable". Not particularly relevant to this discussion, but I just wanted to point out that proxy nodes no longer equate to "missing code".
For execution mode, one option to consider is just making this a part of the tag. For example, setting the first bit of the tag to 0
for local execution and 1
for network execution. The second bit can be used to indicate whether the note is intended for a specific account or for some use case (e.g., swap). To summarize, we'd have the following tag formats:
00
+ recipient_prefix
: note is intended for local execution by a specific account. Recipient prefix here could be based on an account ID, but it could also be anything else that may be used to identify the recipient of the note (e.g., prefix of a public key, or even some random value). One suggested way to set the prefix could be to use first 16 bits of account ID followed by 14 zeros.01
+ use_case_id
+ use_case_metadata
: note is intended for local execution for a specific use case (e.g., swap). Everything beyond 01
should probably be discussed separately, but I'm thinking use_case_id
could be 14 bits and use_case_metadata
could be 16 bits.10
+ account_id_prefix
: note is intended for network execution against a specific account. The account is defined by the account_id_prefix
(could be first 30 bits of account ID). The account must be a public account. That is, the 3rd bit of account ID should be 0
(it may make sense to re-define layout of account ID to move this bit up so that this tag prefix is always 100
). Also, note storage mode must be "public".11
+ use_case_id
+ use_case_metadata
: similar to the 01
case but now the note is intended to be processed by the network, and so note storage mode must also be "public". The details of how this would work we can leave for the future - i.e., notes with tag prefix 11
would not be allowed for now).Some benefits of this scheme:
00
+ 16_bit_account_prefix
+ 14 zeros
for the tag and this filters out all irrelevant notes (e.g., notes intended for networks execution).01
case, the inclusion of additional metadata into the tag could help with filtering. For example, for the swap note, the 16 bits of metadata would be used as: 8 bits to identify requested asset and 8 bits to identify the offered asset (in both cases these could be 8-bit prefixes of corresponding faucet IDs). This should really help narrow down a set of relevant notes.10
case, it will be pretty easy for the operator to identify relevant notes and accounts against which these notes are to be executed.
a. The operator will still need to do additional work to figure out if the note is actually executable against the specified account (and when it can be executed) - but that's a separate problem. For now, we can be very strict and say that if the note cannot be executed immediately, it is deemed unexecutable.Some downsides:
01
case. Specifically: 010
Could imply the that note must be public and 011
could imply that the note must not be public (and we can enforce these checks in the kernel).To summarize, the new metadata struct could look as follow:
pub struct NoteMetadata {
sender: AccountId,
mode: StorageMode,
tag: u32,
aux: Felt, // this is the additional arbitrary data
}
Actually, there is no reason why we can't call storage mode NoteType
(and keep what we already have in #515). So, then metadata would look like so:
pub struct NoteMetadata {
sender: AccountId,
note_type: NoteType,
tag: u32,
aux: Felt,
}
pub enum NoteType {
OffChain,
Encrypted,
Public,
}
And to summarize tag validation rules:
tag
must start with either 00
or 011
.tag
must start with 00
, 010
, or 10
10
, the 5th bit of the tag (the 3rd bit of account ID) must be 0
. If we were to modify account ID encoding scheme by moving the storage mode bit upfront, this rule would go away and the only rule we'd have is that the set of allowed tags prefixes for public notes would be 00
, 010
, and 100
.tag
must start with 00
.If we do modify the account ID schema so that the first bit of the ID is 0
for public accounts and 1
for off-chain accounts, we could simplify things a bit. Specifically, we could give the following meaning to tag prefixes:
00
: local execution for specific recipient.01
: local execution for a use case, off-chain note.101
: local execution for a use case, public note.100
: network execution for specific account.11
: network execution for a use case.Tag rules then become:
0
.0
.00
.But maybe I'm overthinking things :)
Ok, that sounds great.
The first question that comes to my mind is, is there a reason why we need to encode everything in the tag using bits? We could also have more fields in the Note metadata.
pub struct NoteMetadata {
sender: AccountId,
note_type: NoteType,
execution: ExecutionType,
aux: Word,
use_case: UseCaseType,
tag: u32,
}
pub enum NoteType {
OffChain,
Encrypted,
Public,
}
pub enum ExecutionType {
Network,
Local,
}
pub enum UseCaseType {
Swap,
P2ID,
LimitOrder,
}
The notes might become bigger, but as far as I understood, we don't plan to store them on Ethereum. That way, the schema might be way more usable and easier to understand. In the aux
field, ppl can store 4 Felts for their respective use case, which might be the intended recipient, the assets to swap, or whatever.
I like your schema's aux
field. We need to allow for future use cases, and people can develop their own schema and encoding in this field.
But I don't see a particular downside to your approach either. It might only overcomplicate things.
And one additional point @bobbinth @hackaugusto can we put the idea of encrypted notes aside for now? No Pioneer is asking for it so far and we can save time if we don't implement it now.
@bobbinth do you want these account id changes implemented? That is the only thing pending AFAIU (the storage bits are already implemented by https://github.com/0xPolygonMiden/miden-base/pull/515)
And one additional point @bobbinth @hackaugusto can we put the idea of encrypted notes aside for now? No Pioneer is asking for it so far and we can save time if we don't implement it now.
Yes, we are not implementing the encrypted notes for the time being.
The first question that comes to my mind is, is there a reason why we need to encode everything in the tag using bits?
The main reason is that the tag
is a queryable field - i.e., this is the field the client can filter on during state sync request. So, by putting these bits into the tag we effectively allow the users to say something like "i'd like to get all the public notes intended for a specific use case", or "I'd like to get all the notes intended for local execution by a specific account".
But we can still get something similar to what you described like so:
pub struct NoteMetadata {
sender: AccountId,
note_type: NoteType,
tag: u32,
aux: Felt,
}
pub enum NoteType {
OffChain,
Encrypted,
Public,
}
pub enum ExecutionMode {
Network,
Local,
}
impl NoteMetadata {
pub fn execution_mode(&self) -> ExecutionMode {
let first_bit = self.tag >> 31;
let first_3_bits = self.tag >> 29;
if first_bit == 0 || first_3_bits == 101 {
ExecutionMode::Local
} else {
ExecutionMode::Network
}
}
}
@bobbinth do you want these account id changes implemented? That is the only thing pending AFAIU (the storage bits are already implemented by #515)
I guess it depends on the amount of work it would take. If this is just a few hours - I'd go for it, but if it is a much bigger refactoring, maybe it makes sense to do it later. I believe it should be the former - but I didn't look into it closely enough to know for sure.
I guess it depends on the amount of work it would take. If this is just a few hours - I'd go for it, but if it is a much bigger refactoring, maybe it makes sense to do it later. I believe it should be the former - but I didn't look into it closely enough to know for sure.
Sounds good, will give it a try
@hackaugusto, @Dominik1999 - I've created a few more specific issues (i.e., #538, #539, and #540) based on the above discussions. So, this issue can be closed as soon as https://github.com/0xPolygonMiden/miden-base/pull/515 is merged.
I think once all these 4 issues are closed, we should also be able to close #403 and #496.
Closed by #515. The rest of this issue will be addressed in separate issues as mentioned in the comment above.
Currently, note metadata consists of 3 elements:
sender
,tag
, andnum_assets
.num_assets
will be removed as a part of #281, but we'll need to add note type to metadata and also come up with a methodology for specifying tags. Possible approaches for both of these are described below.Note type
To properly propagate notes through the network and also to properly process them in the transaction kernel, we need to know some basic info about what kind of a note it is. Specifically, we can categorize notes as follows:
We need just 2 bits to represent these 4 possible types and these could be encoded into a single field element within the VM. For now, these won't affect much except that we'd only allow off-chain notes (i.e., the transaction kernel will raise an error if we are trying to create either a public or an encrypted note). We'll lift these limitations in the future.
Note tags
We use note tags primarily for routing/discovery purposes. That is, the user can query the node for a set of tags, and the node will return all notes which match the specified tags (the way the
state_sync
endpoint is currently implemented in the node, we actually don't query for the entire tag, but only for a 16-bit prefix of a tag - but let's put this aside for now).The size of a tag is one field element (i.e., 64 bits).
For the 3 standardized scripts we currently have, we set the tags as follows:
P2ID
andP2IDR
- the tag is set to the target account's ID.SWAP
- the tag is set to $0$.There are a few things we can improve about the above approach:
First, we can reduce the tag size from 64 bits to 32 bits. This reduces size of the note commitment without really impacting anything (i.e., we probably don't really need 64 bits for any type of tag).
Second, while we can't (and shouldn't) enforce rules on how users define tags, we can recommend some basic schemas. For example, we could categorize notes as follows:
Then, we can say that, by convention, tags that start with
0
are intended for notes sent to specific accounts, and tags which start with1
are indented for notes which are sent for many (or any) accounts. Or more concretely, tag formats could be:0
+ 15-bit account prefix (this can be increased in the future to increase the level of privacy).1
+ 15-bit use case ID + 16-bit tag payload. The 16-bit payload could be defined differently for a specific use case.Applying these formats to our standardized scripts could look as follows:
P2ID
andP2IDR
we'd use the first format - so, the tag would be:0
+ 15 most significant bits of target account ID + 16 zeros.SWAP
we'd define use case ID as $0$ - so, the tag would be:1
+ 31 zeros.We could also make the
SWAP
tag a bit more sophisticated. For example, to let people filter notes more easily, we could define the tag payload as: 8 bits of the faucet ID of the requested asset + 8 bits of the faucet ID of the offered asset. This way, if we provided the capability to query notes by the full tag, the users will really be able to narrow down the set of returned notes. But also, maybe we are running a bit too far ahead with this.