0xPolygonMiden / miden-node

Reference implementation of the node for the Polygon Miden rollup
MIT License
53 stars 37 forks source link

Endpoint to perform client state sync #43

Closed hackaugusto closed 1 year ago

hackaugusto commented 1 year ago

ref: https://github.com/0xPolygonMiden/miden-node/discussions/38

### Tasks
- [ ] Database: add table for notes
- [x] Database: add table for accounts
- [x] Database: add table for nullifiers
- [x] protobuf: add request / response messages
- [ ] rpc/store: add endpoints for the above messages
bobbinth commented 1 year ago

I think as initial implementation (to simplify things) we can do something like this:

Request {
    block_ref: <block number of the last sync>,
    note_hashes: [a list of note hashes, maybe limited to 1024 or something similar],
}

Response {
    chain_tip: <number of the latest block in the chain>,
    block_header: <block header of the block with the first note matching the specified criteria>,
    mmr_delta: <data needed to update the partial MMR from `block_ref` to `block_header.block_num`>,
    notes: [a list of all notes together with the Merkle paths from `block_header.note_root`],
}

This will work for the first testnet purposes and we can expand request options later on to include tags, senders etc.

To state one assumption explicitly: once the client applies mmr_delta to their partial MMR, they should end up with Merkle path from the block header for block_ref block to one of the new MMR peaks.

bobbinth commented 1 year ago

By the way, for the notes table, I was thinking of the following structure:

In the above, a combination of block_num and note_index would be the primary key.

And for the accounts table, I was thinking something like this:

In the above, id would be the primary key.

Dominik1999 commented 1 year ago

Should that issue https://github.com/0xPolygonMiden/miden-node/issues/22 be closed then? It looks like account states are also being synced over this endpoint.

bobbinth commented 1 year ago

As mentioned in https://github.com/0xPolygonMiden/crypto/pull/195#discussion_r1365203827, we may want to include one more field into the response. The new response schema would look like so:

Response {
    chain_tip: <number of the latest block in the chain>,
    block_header: <block header of the block with the first note matching the specified criteria>,
    mmr_delta: <data needed to update the partial MMR from `block_ref` to `block_header.block_num`>,
    block_path: <Merkle path in the updated chain MMR to the block at `block_header.block_num`>,
    notes: [a list of all notes together with the Merkle paths from `block_header.note_root`],
}

When interpreting this response, the client would do something like this to update its local chain MMR:

let num_notes = /* based on the number of notes returned in the response */

chain_mmr.update(mmr_delta).unwrap();
if num_notes > 0 {
    chain_mmr.add(block_num, block_header_hash, path_to_last_header).unwrap();
}

// TODO: check that new chain_mmr is consistent with `block_header.chain_root`

Thus, by repeatedly calling this endpoint until chain_tip == block_header.block_num, the client will achieve the following:

bobbinth commented 1 year ago

Expanding on the ideas from https://github.com/0xPolygonMiden/miden-node/discussions/38#discussioncomment-7333876 and simplifying some things, I think we can combine everything into a single endpoint which we can call sync_state. The request/response pair for this endpoint could look something like this:

Request {
    block_ref: <block number of the last sync>,
    account_ids: [a list of account IDs],
    note_tags: [a list of 16-bit prefixes of note hashes],
    nullifiers: [a list of 16-bit prefixes of nullifiers],
}

Response {
    chain_tip: <number of the latest block in the chain>,
    block_header: <block header of the block with the first note matching the specified criteria>,
    mmr_delta: <data needed to update the partial MMR from `block_ref` to `block_header.block_num`>,
    block_path: <Merkle path in the updated chain MMR to the block at `block_header.block_num`>,
    accounts: [a list of account hashes updated after `block_ref` but not after `block_header.block_num`],
    notes: [a list of all notes together with the Merkle paths from `block_header.note_root`],
    nullifiers: [a list of nullifiers created between `block_ref` and `block_header.block_num`],
}

In the above:

Pros

Cons

Open questions

bobbinth commented 1 year ago

Should that issue #22 be closed then? It looks like account states are also being synced over this endpoint.

I think we should still keep it, but in my mind, it is no longer on the critical path.

hackaugusto commented 1 year ago

Some thoughts about the nullifiers request:

Maybe the user behavior could also affect the effectivess of the approach above. I think the best way of having a solution for this is to actually anonymize the request for nullifiers, or to request from multiple nodes instead.

bobbinth commented 1 year ago

Great points! My current thinking is that we can leave the exact strategies for how to rotate nullifier prefixes, how many decoys to use etc. to the future. And maybe these could be determined individually by the clients based on their specific needs. As you mentioned, these may need to rely on some additional data (current/historical TPS), but I'm thinking that this can be obtained from external sources or maybe even from the nodes themselves via a different endpoint.

A couple of specific comments.

The client should not stop requesting a key once it sees the target nullifier. The reason is that it is very unlikely there was another nullifier in the same response, if the client stops requesting a prefix after seeing the one it is interested in, then it is very clear this was the nullifier of interest.

Yes, this how I imagined it working - i.e., the client keeps using the same prefix until it receives at least several (or maybe several dozen) nullifiers for this prefix. At 20 TPS, this would mean that nullifier rotation could happen on daily basis. But again, the exact strategies and parameters require much deeper analysis and probably can be left to the future.

Another alternative is that a client should include decoy filters, and maybe rotate them once in a while. However, this would also need some thought, because rotating filters would need to have a time component, and if a filter is added a non regular interval, that itself is a signal that is the category of interest.

Good idea, and I think it can be used together with (rather than instead of) the above.

what about the clients that are offline for a while? 2. how to determine the number of decoys to download?

My initial thinking was that in such situations the client would not rotate the set of nullifiers during the entire sync. But there could be other strategies too.

hackaugusto commented 1 year ago

My current thinking is that we can leave the exact strategies for how to rotate nullifier prefixes, how many decoys to use etc. to the future. And maybe these could be determined individually by the clients based on their specific needs.

I think I didn't do a good job at presenting my point. I'm trying to say it isn't a good strategy to rotate decoys. Here is another take:

  1. Privacy is a concern with this endpoint
  2. To have privacy, we shouldn't leak data about the users intent
  3. Nullifiers are produced when a note is consumed
  4. There are some use cases which have a bound to the time a note is consumed, but that is not a given. IOW there is no upper bound for the time it takes to produce the nullifiers in the general case
  5. Because the API is time based, and the user needs to learn about the nullifier is cares about, it can not rotate that specific prefix it cares about before they see the target data
  6. To not leak data, that means every filter needs to live for a similar amount of time as the other prefixes, including the decoys. (an alternative is to have decoys that live for a random amount of time, and that would include decoys that live less or more than the longest real filter, my impression is that on average it should be the same though)
  7. But that time is unbounded because of 4, meaning filter will tend to live for a long time
  8. The last point increases overhead in the client, because it needs to download more data than they need. In the server, because they are providing said data. Which in turn increases load in the network

With that said, the proposal is to go with 100TPS as a estimate, so it seems the decoys won't be necessary.

hackaugusto commented 1 year ago

Recently we discussed this endpoint could be sufficient for the state sync needs, I see a few issues:

Issue 1: This assumes clients has a reference block number to start from. Which is not true for new clients.

Issue 2: The above can be read to imply it is possible to provide such a start block, which is not true. Example: Client wants to consume a DEX note produced prior to that reference point (would never see the note's inclusion proof).

Issue 3: This assumes blocks which have been synced over are no longer useful, which is not true. Example: Client has synced from genesis to the tip of the chain, and wants to consume a note that has been produced already. (here the DEX example applies too, the note's inclusion proof would be missing, but not because the block wasn't seen, but because the request that saw the block didn't include note's tag)

Issue 4: The issues 2/3 also happen for the nullifiers.

Because of the above, I think this endpoint is useful for a client to fetch state from the rollup when it's filters have not changed. But we still need endpoints to look in the past.


Here are some possible solutions:

For issue 1: I guess we can use the endpoint to fetch the block header to fetch the latest block header, and use that as the reference point. Note: This doesn't fix the other issues.

For issue 2, w/ side communication channel: The user would learn about the note's block height by the same means it learns about the note, or even better, the user would receive the authentication path for the note, and this endpoint is not necessary for this case. [^1]

For issue 2, wo/ side communication channel: If there is no out-of-bound communication channel, then the user does not know the block height, and has to start at genesis. This doesn't have great performance and needs some thought w.r.t pruning.

For issue 2, w/ pay-to-address: For a subset of the notes, which need the user's account id as a target, the issue goes away with some careful setup:

  1. user gets latest block header before creating their account and learns its height. There can't be P2ID prior to this block, since the id doesn't exist yet
  2. user provides the generate account id to be used in P2ID
  3. user starts syncing from block learned at step 1

For issue 3: The solutions for issue 2 applies. On top of that, the client would need to perform multiple concurrent requests, and hopefully the request for newly added items would move fast enough so the user could merge the syncs into a single call.

For issue 4, w/ side communication channel: This could be a grievance attack vector, so the user shouldn't trust anything except the state of the head. [^1] [^2]

For issue 4, wo/ side communication channel: I think this would need an endpoint to fetch the nullifier based on epochs similar to this

[^1]: These would need a way of querying the authentication path to the MMR. For the case of the note and the non-empty nullifier, that is because the MMR would be changing. For the case of the empty nullifier, the user needs to request the latest block. [^2]: An attack could happen because there is nothing in the nullifier hash to guarantee the nullifier is unique. The nullifier is defined as hash(serial_num, script_hash, input_hash, vault_hash), for some notes like a DEX it is possible to assign the same values to all the inputs. An attacker that is willing to burn tokens could create those notes.

bobbinth commented 1 year ago

For issue 1: I guess we can use the endpoint to fetch the block header to fetch the latest block header, and use that as the reference point. Note: This doesn't fix the other issues.

If the client wants to sync from genesis, then they can probably use 0 as block_ref for the first request.

If the client doesn't care about syncing from genesis (e.g., this is a new client and they know for sure there are no nullifiers or notes addressed to them), then we'll need to provide another endpoint. Something that gives the client:

bobbinth commented 1 year ago

For issues 2, 3, 4, I think a solution would be to introduce the concept of epochs (similar to what you mention in your post). Let's say we define an epoch as two weeks, then the endpoint would work as follows:

  1. We still send the same request with block_ref and other parameters.
  2. We still get the same response back, but the node also paginates on epochs. That is, the returned block_header is for the block which either contains the notes we are interested in or is the last block in a given epoch.

With the above, the client will have reference point on epoch boundaries. So, if they would like to get some additional data from a past epoch, they'll be able to request it and merge in into their local database.

I can implement this enhancement at a later date though.

hackaugusto commented 1 year ago

accounts: [a list of account hashes updated after block_ref but not after block_header.block_num],

I think this sholud be a list accounts which have their last change in the range (block_ref,block_num]. Otherwise:

  1. The node has to keep the history of changes for the account hashes, instead of only the latest.
  2. This will increase the amount of data sent to the client, if an account change twice, I believe we don't need the intermediary state
bobbinth commented 1 year ago

I think this sholud be a list accounts which have their last change in the range (block_ref,block_num].

Agreed!

hackaugusto commented 1 year ago

By the way, for the notes table, I was thinking of the following structure:

Why would the note table be normalized, whereas all the other tables are not?

If we apply the same design as the other tables, it should look like:

        CREATE TABLE
            notes
        (
            pk INTEGER NOT NULL,
            tag INTEGER NOT NULL,
            block_num INTEGER NOT NULL,
            note BLOB NOT NULL,

            PRIMARY KEY (pk),
            CONSTRAINT notes_tag_is_felt CHECK (tag >= 0 AND tag < 18446744069414584321),
            CONSTRAINT notes_block_num_is_u32 CHECK (block_num >= 0 AND block_num < 4294967296),
            CONSTRAINT notes_note_isnt_empty CHECK (length(note) > 0),
            FOREIGN KEY (block_num) REFERENCES block_header (block_num)
        ) STRICT, WITHOUT ROWID;

Instead of:

        CREATE TABLE
            notes
        (
            block_num INTEGER NOT NULL,
            note_index INTEGER NOT NULL,
            note_hash BLOB NOT NULL,
            sender INTEGER NOT NULL,
            tag INTEGER NOT NULL,
            num_assets INTEGER NOT NULL,
            merkle_path BLOB NOT NULL,

            PRIMARY KEY (block_num, note_index),
            CONSTRAINT notes_block_number_is_u32 CHECK (block_num >= 0 AND block_num < 4294967296),
            CONSTRAINT notes_note_index_is_u32 CHECK (note_index >= 0 AND note_index < 4294967296),
            CONSTRAINT notes_note_hash_is_digest CHECK (length(note_hash) = 32),
            CONSTRAINT notes_sender_is_felt CHECK (sender >= 0 AND sender <= 18446744069414584321),
            CONSTRAINT notes_tag_is_felt CHECK (tag >= 0 AND tag <= 18446744069414584321),
            CONSTRAINT notes_num_assets_is_u8 CHECK (tag >= 0 AND tag < 256),
            -- 32 is the size of a digest
            -- 20 is the value of NOTE_TREE_DEPTH
            CONSTRAINT notes_merkle_path_is_simple_tree CHECK (length(merkle_path) = 32 * 20),
            FOREIGN KEY (block_num) REFERENCES block_header (block_num)
        ) STRICT, WITHOUT ROWID;
hackaugusto commented 1 year ago

num_assets: u8 - we need this for now, but after https://github.com/0xPolygonMiden/miden-base/issues/281, we'll need to remove it.

I'm adding these optimizations. But I would like to reemphasize that for the protobuf and sqlite layers these changes don't make sense.

For protobuf all integers types are encoded using the same VARINT technique in protobuf:

image

And the encoded size in bytes is variable, depending on the contents of the integer. In other words, these changes are only adding unnecessary type castings when encoding the protobuf messages.

Sqlite is similar, and all integer types are encoded using the same storage class:

image

That is also variable size:

INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

bobbinth commented 1 year ago

Why would the note table be normalized, whereas all the other tables are not?

There are several reasons:

Also, I probably wouldn't impose foreign key constraints. I remember reading that they are a real performance killers in SQLite.

bobbinth commented 1 year ago

I'm adding these optimizations. But I would like to reemphasize that for the protobuf and sqlite layers these changes don't make sense.

I think it is good to specify more restrictive types - primarily for code clarity reasons. Also, in the future we might move to a different underlying database, and it would be easier to carry over if the types are specified more precisely.