0xPolygonMiden / miden-client

Client library that facilitates interaction with the Miden rollup
MIT License
25 stars 22 forks source link

Implement `NoteFile` importing/exporting #371

Closed igamigo closed 1 day ago

igamigo commented 1 month ago

What should be done?

Following https://github.com/0xPolygonMiden/miden-base/pull/721, NoteFile will be the type used for importing/exporting notes from applications like the faucet, so we need to implement importing and exporting features from the client.

How should it be done?

When is this task done?

When importing and exporting to and from files happens with NoteFile instead of InputNoteRecord

Additional context

No response

bobbinth commented 4 weeks ago

I would probably rename Client::import_input_note() into just Client::import_note(). Also, a few more thoughts (feedback is very much appreciated):

First, we now have an option to import a note in 3 different states: note ID, note details, and full note with inclusion path. Here is how I think these cases can be handled (some of these things are already done):

As a result of the above, the input_notes table may contain notes in the following states:

So, at the end of the state sync request (once we synced to the chain tip) we would need to deal with some of these notes separately (i.e., handle the "leftover notes" as introduced in #361). The cases we care about are as follows:

Two important things to note here:

Overall, the Client::import_note() method could look as follows:

pub async fn import_note(
    &mut self,
    note: NoteFile,
) -> Result<InputNoteRecord, ClientError>

Separately, I would consider renaming InputNoteRecord into just InputNote and refactoring to use NoteDetails (and maybe be enum-based).

tomyrd commented 4 weeks ago

(1) the note is tracked via a tag and (2) the note is not tracked via a tag.

What's an example of the second case? Is it a note which we don't know its tag? Or a note tag that we want to ignore?

As a result of the above, the input_notes table may contain notes in the following states:

In terms of expected or commited imported notes. Is the idea to assume its NoteState based on the NoteFile variant (NoteFile::NoteWithProof being commited and the others expected) and treat them differently? What if the NoteFile::NoteDetails is already commited in chain when we imported? Do we store it as Expected and then update it with a sync?

Right now we always try to get the inclusion path of an imported note (with a combination of calls to GetNotesById and GetBlockHeaderByNumber). If we were to mantain this, even if the file is NoteFile::NoteDetails we would check it's existance in chain and update them accordingly before storing.

tomyrd commented 4 weeks ago

I already started working on this ( #375 ). Right now I just added the basics so that the client works with the new faucet. I can add more functionality to it once we decided how to treat each case (especially the NoteId which is not supported right now).

bobbinth commented 4 weeks ago

(1) the note is tracked via a tag and (2) the note is not tracked via a tag.

What's an example of the second case? Is it a note which we don't know its tag? Or a note tag that we want to ignore?

It can actually be both (i.e., we either don't know the tag, or we know the tag but don't want to add it to the list of tags which are tracked regularly).

I don't have a great example for this, but I can imagine someone sending me a P2ID note with a random tag (maybe because they don't want to put even the first 16 bits of my account ID into the metadata). Since I have the note, I should be able to consume it - but at the same time, I don't want to add a random tag to the list of tags I query regularly.

In terms of expected or committed imported notes. Is the idea to assume its NoteState based on the NoteFile variant (NoteFile::NoteWithProof being committed and the others expected) and treat them differently?

Yes - but it also depends on how much data we already have in the client for a given note.

What if the NoteFile::NoteDetails is already committed in chain when we imported? Do we store it as Expected and then update it with a sync?

Yes, except for the NoteFile::NoteId` case, we don't make requests to the node on import and try to "hydrate" the note on sync. If the note is a "tracked" note, this could happen naturally during the sync process. But there can also be cases when the user may need to do it manually (i.e., the imported tracked note is "in the past" with respect to the current sync height).

Right now we always try to get the inclusion path of an imported note (with a combination of calls to GetNotesById and GetBlockHeaderByNumber). If we were to maintain this, even if the file is NoteFile::NoteDetails we would check it's existence in chain and update them accordingly before storing.

I think right now there is a verify parameter which controls this. Maybe it makes sense to keep it (I'll need to think more about it).

tomyrd commented 4 weeks ago

I think right now there is a verify parameter which controls this. Maybe it makes sense to keep it (I'll need to think more about it).

You are right, there is a --no-verify flag that skips this step. Plus, now that I looked at bit further, we don't always try to get the inlusion path (my bad). We only try to get it if the note is in the past, so if the client is behind the inclusion proof for the imported note will be added naturally with the sync.

tomyrd commented 3 weeks ago

I'll start working on adding this changes to #375.

I also created a PR in miden-base to add the tag to NoteFile::NoteDetails.

I have a question about the tag in NoteFile. That tag may differ from the real tag of the note in the node. What would happen in that case? If the idea is to blindly add the received tag to the client's list, could someone with malicious intent add a tag that ends up retrieving a different note than the one in the original NoteFile? Would this be a problem?

igamigo commented 3 weeks ago

I have a question about the tag in NoteFile. That tag may differ from the real tag of the note in the node. What would happen in that case? If the idea is to blindly add the received tag to the client's list, could someone with malicious intent add a tag that ends up retrieving a different note than the one in the original NoteFile? Would this be a problem?

I think the user could eventually verify the validity by retrieving the real metadata from the node. However, this might defeat the purpose of sending the tag anyway (hiding the interest on a specific NoteId).

I'm not totally convinced adding the NoteTag to the variant is a very useful feature but I might have not thought through all the consequences. Just to clarify, we wouldn't add the tag to the persistent list of tags, right? We would just add the note's tag to the sync request until the note we are tracking is actually retrieved.

tomyrd commented 3 weeks ago

If we don't add the NoteTag to the NoteDetails variant, we should at least add a boolean that represents whether the note is meant to be tracked (updated by the tag) or untracked (updated by GetNotesById).

About this last case (untracked notes) should the GetNotesById be called on each sync until it's updated or only try once when importing? If it's the latter, what should we do if it fails?

bobbinth commented 3 weeks ago

If we don't add the NoteTag to the NoteDetails variant, we should at least add a boolean that represents whether the note is meant to be tracked (updated by the tag) or untracked (updated by GetNotesById).

That's how I was thinking about it: we wouldn't add the tag to the list of tags but would rather use it to figure out if a given note is tracked. We could still store the imported tag with the note and the overwrite it (together with the rest of metadata) once the actual note gets retrieved.

About this last case (untracked notes) should the GetNotesById be called on each sync until it's updated or only try once when importing? If it's the latter, what should we do if it fails?

I was thinking we'd do it on every sync for now. In the future, we could have a more sophisticated strategy where we implement some kind of backoff strategy.

Another option is to leave management of the leftover notes to the user (i.e., CLI in our case). For example, the client could have a method for trying to retrieve leftover notes, but this method would be invoked by CLI rather than by the client automatically.

tomyrd commented 3 weeks ago

I think I may have gotten a little bit lost with this one:

That's how I was thinking about it: we wouldn't add the tag to the list of tags but would rather use it to figure out if a given note is tracked. We could still store the imported tag with the note and the overwrite it (together with the rest of metadata) once the actual note gets retrieved.

I think I'm confusing the tag that comes with the NoteFile with the tag would be retreived from the node with the metadata. I'll call them file tag and metadata tag respectively. What I understood is:

We store the file tags separate from the persistent list of tags (the ones that are stored+tracked in the client). When doing a sync we add all the file tags to the note_tags sent to the node. When the metadata for imported+tracked notes we discard the file tag in favor of the metadata tag. For the notes that didn't come with a note tag, we request all of the additional information via GetNotesById on each sync.

bobbinth commented 3 weeks ago

Sorry, I should have described it more clearly. Here is how I'm thinking about it:

When we import NoteFile, use use the file tag (if present) to figure out if the note is tracked or not. If it is tracked, we don't do anything extra. If it is not tracked, we need to mark it as an untracked note somehow. This could be a boolean flag in the input_notes table, but could be something else. We probably also want to store the file tag with the note (in the input_notes table) so that if we start tracking this tag (e.g., by adding the tag to the list of tags we track manually), we can update the boolean flag.

When doing the sync, we don't do anything extra with the file tags - but we do make a separate call to get notes by ID for the untracked notes. Any note which we get data for from the node, we update in the input_notes table - at this point, we'd simply overwrite file tag with metadata tag (in most cases, they probably would be the same, but that's not guaranteed).

So, basically, there is not separate place in which we'd store file tags - they are stored together with the imported notes.

igamigo commented 3 weeks ago

Some questions/comments @bobbinth:

bobbinth commented 3 weeks ago

For notes that come with a tag, you mention we don't do anything extra. This means the user is in charge of optionally adding the tag to the persistent tag list, correct? Is this the only scenario where the exported file tag would come into play?

We need the file tag to determine if we already track the note or not. Or said another way, without the file tag, we wouldn't be able to tell if the note is already covered by existing flags.

We also discussed adding the tag to the sync request tag list ephemerally (that is, added to the sync request until the note we are tracking is retrieved); this can be done optionally through a flag. This would have the advantage that the user does not need to remove the tag by hand later.

We could do this, but it adds complexity. For example, if the user adds a "permanent" tag which would be the same as "ephemeral" tag, we need to make sure to keep it after the note with the "ephemeral" tag is retrieved.

In general, I'm trying to keep things relatively simple in the client to make it less "opinionated" (and move some of the "orchestration" logic into the CLI). But maybe the complexity here is not as big as I'm imagining.

I don't think I understood this correctly: If we start tracking the tag manually, that means the note file came with a tag, which means the boolean flag was set to 'tracked'. So why would we want to update it or what would we want to update it to?

Let's say the client is tracking tags A and B. The user then imports a note with tag C. This note gets marked as "untracked" because C is not in the list of tracked tags. Later on, the user adds tag C manually to the list of tracked tags. This means the note should now be marked as "tracked".

I think with these changes it would still make sense to keep the verify option (either at the CLI level or at the client level as well). It could be used to assert the tag was correctly set (or at least, warn the user about it?). If we decide to remove it, I wonder if we should surface a client method for users to update NoteMetadata for stored notes, the idea being that they might want to do a GetNoteDetailsById manually and check (and update) the metadata themselves. Right now import_input_note() (which we should rename to import_note()) would error if the NoteId exists on the store already which is why we discussed changing it as part of 361 (see also these comments)

I think we should always overwrite note metadata with the data we get from the node. That is, until we get the note from the node, we can't be sure if the metadata we got from NoteFile is actually correct. We have 3 ways to import a note file:

Regarding verify option - I agree that it makes sense to have something like it, but I wonder if this is more of a CLI job. Basically, if we have verify in the client on note import, we also need to run state sync - otherwise, we can't really verify that the note is in the chain (i.e., the note could be "in the future" in relation to the client's current sync point). And I'm not sure running state sync on import implicitly is a good idea.

If the user is in charge of adding the tag to the list of tags for the sync request, then it's possible this does not add much to the client sync design nor the UX other than having a convenient way of passing NoteTag objects around (which might be enough), right? Something very similar to feat: Get data for leftover notes #361 would have to be implemented anyway.

There are two ways of adding tags to this list:

  1. Any time the user creates an account, a tag corresponding to this account ID should be added to the list of tags.
  2. A user can add (and remove) tags manually - but not the ones corresponding to account IDs.

In vast majority of the cases, the user would get notes matching these tags. Most of the discussion above is about handling edge cases when the file tag does not match any of the tracked tags (and this should be a very rare exception).

igamigo commented 3 weeks ago

Ah, OK, I understand better now. Thanks for explaining! A couple more comments:

Regarding verify option - I agree that it makes sense to have something like it, but I wonder if this is more of a CLI job. Basically, if we have verify in the client on note import, we also need to run state sync - otherwise, we can't really verify that the note is in the chain (i.e., the note could be "in the future" in relation to the client's current sync point). And I'm not sure running state sync on import implicitly is a good idea.

Is this true? GetNotesById would return valid data regardless of client's sync height, at least enough so that the client can verify that: a) the note exists and b) NoteTag is as expected. If the note is in the future the sync will eventually happen (at user's will) and the note's data will be downloaded. If the note is in the past, then it's a matter of maybe downloading block/MMR data.

We could do this, but it adds complexity. For example, if the user adds a "permanent" tag which would be the same as "ephemeral" tag, we need to make sure to keep it after the note with the "ephemeral" tag is retrieved. In general, I'm trying to keep things relatively simple in the client to make it less "opinionated" (and move some of the "orchestration" logic into the CLI). But maybe the complexity here is not as big as I'm imagining.

I agree with making it less opinionated and having the CLI represent only one perspective of how to handle things. To clarify though, I think this could be solved in a simpler way: On sync requests, we can just group all Pending notes (optionally we can just group Pending notes that have been imported) and add those tags to the sync request's list of tags. Once their respective data is downloaded and the notes are marked as committed and the client syncs again, the tags for these notes will not be included in the next sync (unless they are in the user's list of permanent tags, which is left untouched). This solution might still be too much complexity, but mentioning it just in case.

For reference, this is how account tags are currently handled - they are not included in the user's list of tags, we force-add them to the sync's list of tags on each request, regardless of the user's settings.

tomyrd commented 3 weeks ago

I tried to summerize all of the changes, tell me if I missunderstood something:

Also, just to be clear, ephemeral tags are tags that are not in the tag table and are not inferred from the tracked accounts but are still added separately to the status request to the node. Currently these would be the tags of uncommitted notes

I'll edit this message with the corrections as they come so we have a clear and centralized list.

bobbinth commented 3 weeks ago

Is this true? GetNotesById would return valid data regardless of client's sync height, at least enough so that the client can verify that: a) the note exists and b) NoteTag is as expected.

Ah yes - that's correct.

NoteId: We complete the note with the information on chain via GetNodesById (only if it returns a public note, if private we throw an error). The completed note is then inserted in the store with its metadata. The status depends on the inclusion proof of the response, if it's included we add it to the note and insert it as Committed else we insert it as an Expected note. If the note was already in the store we update it's metadata/incusionproof/status as necessary.

One correction here: we would always insert it as a Committed note. This is because if we get a response from the node, we would get the full note (together with inclusion proof).

For Committed notes, if they don't have a block header, we will request the node for them. When we do this we should also check if the metadata+inclusion proofs are valid.

This is a good point. We may end up with some Committed notes for which the data is invalid (i.e., the inclusion proof does not verify against the block header). We'll need to have a strategy for dealing with them.

Notes marked as untracked are not added as ephemeral tags, we will request the node manually for each of them on each sync. If we get a response with a note (regardless if it's public or private) we will update it's metadata, inclusion proof and status.

I'm thinking maybe we should move the "automation" part here to the CLI - i.e., the client would not try to retrieve the data for untracked notes as a part of it its sync request, but maybe the CLI would instruct the client (potentially via a separate method) to retrieve the untracked notes.

The verify option is still being discussed.

I think the issue with verify for Client::import_note() is that it is only useful in some cases. Specifically:

Assuming we don't have verify parameter, we can emulate its behavior - though in a bit of a hacky way: we import the note as NoteId, and if that comes back as error, we import it as NoteDetails.

tomyrd commented 3 weeks ago

Another question, what would be the benefits of not requesting the block header in the import for notes that we know are commited (NoteWithProof or NoteId that were already committed). The notes would be stored as Committed but marked as not having a block header so they wouldn't be consumable. This may be weird for the user in terms of UX because they would maybe try to consume them (as they are Committed notes) and fail. They would need to sync before consuming, so they would behave similarly to Expected notes.

igamigo commented 3 weeks ago

For notes that have inclusion proof, they could be either in a future block or a past block compared to the client (or I guess at the same block height, but this is not an interesting case as it's solved by default). In any case they would be marked as Committed. I think as long as we:

we should be fine. Otherwise, for example, importing would possibly imply a sync (in the case of the note being in a future block) which is undesirable as Bobbin mentioned.

tomyrd commented 3 weeks ago

Got it, we want to create a better separation between the import and sync.

I was thinking about the implementation of the tracked/untracked note mark in the store. I find that name a little confusing as untracked notes are technically tracked by the client as they are in the database, maybe calling them acknowledged/ignored notes is better? The ignored notes are not updated in the sync_state method.

I'm thinking maybe we should move the "automation" part here to the CLI - i.e., the client would not try to retrieve the data for untracked notes as a part of it its sync request, but maybe the CLI would instruct the client (potentially via a separate method) to retrieve the untracked notes.

For this we could add a update_ignored_notes method in the client that would internally call GetNotesById and update them. The sync command in the cli could have a --update-ignored flag (or do we want them to be updated by default?).

tomyrd commented 2 weeks ago

There's one thing I realized after working on the integration tests fixes for this new features. Most of the NoteFile::NoteDetails imported will be in the form of ignored/untracked notes, won't they? Most of the time the imported tag will not match with the stored tags so the note will end up being marked as ignored.

If this ends up being the case, most of the imported notes will be retrieved "manually" with GetNotesById, would this be a problem?

bobbinth commented 2 weeks ago

There's one thing I realized after working on the integration tests fixes for this new features. Most of the NoteFile::NoteDetails imported will be in the form of ignored/untracked notes, won't they? Most of the time the imported tag will not match with the stored tags so the note will end up being marked as ignored.

That shouldn't be the case as most of the time the imported note's tags should match a tag derived from one of the tracked accounts. The flow that I think would be the most common:

  1. I want to send a note to you.
  2. I get a tag from you (interactively or by scanning a QR code). Presumably, this would be a tracked tag or a tag derived from one of your account IDs.
  3. I create a note using this tag and send the note file to you.
  4. You import the note - the tag should match one of your tags and so the note will be tracked.

Separately, reading through the discussion above, I'm still not super happy with where we are landing - it feels like there is too much complexity and different combinations of various states/conditions. This will get even more complex once we need to support https://github.com/0xPolygonMiden/miden-base/issues/353 on the client (i.e., we'll be able to create transactions consuming notes which have not been recorded on chain yet).

I'll try to think through this more to see if there is a clean way to organize all of this.

tomyrd commented 1 week ago

I think I found an edge case that can't be resolved from what I've gathered in the summary. In this part we specify what to do if the imported NoteFile::NoteDetails is in the past:

If one of the tracked notes was imported after it was recorded on chain, it will not be updated (since it's not in the response), it's up to the user to update it manually by importing it as a NoteFile::NoteId.

The problem with this is that the NoteFile::NoteId only works for public notes, so if the imported note is in the past we don't have a way to ever update its metadata.

bobbinth commented 1 week ago

The problem with this is that the NoteFile::NoteId only works for public notes, so if the imported note is in the past we don't have a way to ever update its metadata.

I probably could have described it more clearly in https://github.com/0xPolygonMiden/miden-client/issues/371#issuecomment-2147197187, but there are actually two paths here for importing via NoteFile::NoteId.

  1. if the note with this ID is not already in the input_notes table, we make a request to the GetNotesById RPC endpoint. If the response contains a public note, we should add it to the input_notes table (as a committed note). Otherwise, we should return an error.
  2. If the note is in the inputs_note table and it is an expected note - we also make a request to the GetNotesById endpoint to try to convert it to a committed note (the response of GetNotesById should contain note metadata even for private notes).

So, the edge case should be covered by the second path - unless I'm missing something.

mFragaBA commented 2 days ago

should we close this since #375 got merged?

igamigo commented 1 day ago

Closed as done by #375!