Enable getting a proof for non-current account states

igamigo commented 1 month ago

Currently, the GetAccountProofs RPC method provides a way to retrieve public account data, alongside a merkle proof of the account being included in a block. Specifically, you can get the account's header representing the account's state, and an account storage header, which contains all top-level (meaning, either values or roots of maps) elements.

This proof is currently being generated exclusively for the chain tip. However, realistically, a user will not always have the latest block when executing a transaction that uses FPIs. For this, the current endpoint could be updated to return proofs for an arbitrary block (close to the chain tip), making transaction execution easier to set up on the user side.

bobbinth commented 1 month ago

I think the same strategy should be applied to the GetAccountDetailsRequest request. Though, maybe this should be a separate issue.

polydez commented 2 weeks ago

@igamigo I have a concern or need an additional clarification regarding this issue. If account was changed after the requested block, is obsolete account's state still useful for transaction? I mean, should we rather return error (something like "account was updated") in this case? In my vision, the result will be useful for transaction execution only when the latest account update was before the requested block. If so, this will significantly simplify overall solution, since we do not need to track account changes, but only proofs.

igamigo commented 2 weeks ago

In the context of transaction execution, there are a couple of things to keep in mind:

The network/protocol might have a specific block delta after which transaction might be considered outdated (https://github.com/0xPolygonMiden/miden-base/issues/933).
Through the account's contracts, the creator of an account can decide up to which block delta its state is valid (See this). In this sense, if the user executes a transaction that uses FPI with outdated data, the transaction will immediately be considered expired and thus invalid.

Additionally, things like the client might also have extra validations to avoid retrieving stale data, etc.

But in general, I think there can be cases where the user retrieves a state that is not the latest and could be considered still valid (but there should probably be a hard limit after which it does not make sense to retrieve account proofs). I am also not sure if there are going to be any other use cases for this endpoint (or GetAccountDetailsRequest).

polydez commented 2 weeks ago

For implementing this we actually have to solve two problems:

How to get full account state for specific block number?
How to get account inclusion proof for specific block number?

I've tried different ideas and approaches for doing these and I think I end up with the simplest to implement now but yet rather effective solution.

Getting Account State

Currently we store the latest account's state and all account state deltas from the very beginning of account's lifetime. Having initial account's state, we can transition it by sequentially applying deltas to its intermediate states.

There are two feasible solutions (without complete account storage redesign) to get account's state prior the latest account's update:

Store some intermediate account's states (not older than 256 blocks or so). This is difficult because we can't predict future updates and this might also require plenty of additional space and sophisticated algorithms for keeping only needed states and pruning excessive updates in order to save space. But having one of states before requested block number gives us ability to sequentially apply deltas to transition account's state to the requested one.
Use the information we already have (latest state and deltas) gives us possibility to reconstruct account's state to any point of time by applying deltas in reversing order. The only difficulty I can see here is that we need to redesign storage delta to make it reversible and probably need to apply additional checks to nonce in delta to assure it is next to previous. And implement reverse applying, of course. But it seems to be easier and more effective in the most of cases than saving previous account's states.

Getting Inclusion Proof

Currently we have in-memory sparse Merkle tree which stores the latest account hashes for each account in blockchain. It's updated each new block by computing update first and then applying it to the current tree.

If try to avoid ending up with completely different account's hashes storage structure, there a few solutions I can see:

Clone the whole tree every block, keep all copies for required amount of time ($N$ last blocks). It will require additional RAM, but might be acceptable at least for current loads.
Keep one previous state ($N$ blocks back) and all computed updates for this range. We can transition $tip - N - 1$ state to $tip - N$ state by applying update to it (another solution would be to keep only state for each $Nth$ block and purge states older than $N × 2$, but I think applying update is not too expensive).
Keep only oupdates for the last $N$ blocks. Apply them in reverse order on each request. In order to implement this, we will need to implement updating of SMT in reversing order and probably update the change sets itself if they are not ready for such operation (but it seems that the will work).

I think, the third solution might be a good compromise for us now.

Mirko-von-Leipzig commented 2 weeks ago

Getting Inclusion Proof ...

There is a bit of pain associated with (3). Or more specifically, the current forward delta is not enough - you need to know the reverse delta i.e. what value did each node start with. It is possible though, I don't think its super trivial. Its also fairly expensive computationally since you are recomputing hashes (?) - unless the delta's are just hash updates.

There is a fourth option iiuc. Cache the complete account proofs for all recently updated accounts. The benefit of this is that it doesn't involve the original tree at all. The downside is its a bit wasteful ito memory usage, at least if implemented naively. e.g.:

If account x is updated at block y, then we need to store x account proofs from y..=N (N being current chain tip). Once N - y > limit we drop the proofs associated with y. The account can of course get more updates y' etc, but those would have there own N - y' > limit checks.

Memory usage can be improved by creating a proper tree, with Arc<Node> etc. The nice thing about that would be that pruning is automatic.

Effectively we would have the latest tree, and a separate sparse caching tree structure that only has the updated account proofs.

polydez commented 2 weeks ago

There is a bit of pain associated with (3). Or more specifically, the current forward delta is not enough - you need to know the reverse delta i.e. what value did each node start with.

Yes, but since we need this only for reversing order applying, we can use the same delta format, but compute them differently (i.e. additionally to current compute_mutations we can implement something like compute_reverse_mutations and this will produce mutation which after applying will revert SMT state to previous).

There is a fourth option iiuc. Cache the complete account proofs for all recently updated accounts.

I didn't think this way, thank you! Will try to think about it.

Memory usage can be improved by creating a proper tree, with Arc<Node> etc. The nice thing about that would be that pruning is automatic.

Yes, that's the way I was thinking when tried to end up with "perfect" solution for this task. In my idea I tried to make multiple SMTs which reuse same subtrees if they have equal hashes. Each root could be account's root for particular block number. Pruning of old roots would force unused nodes to prune as well. But the main difficulty there was how to access account hashes by account id and I haven't end up with efficient solution of it yet.

polydez commented 2 weeks ago

Cache the complete account proofs for all recently updated accounts.

I was thinking about it and there is an issue with other accounts. When one account was changed, its subtree has to be recalculated up to the root. And this affects proofs of all other accounts. For example, if account 1 was updated in block #10 and account 2 was updated in block #11 and proof of account 1 was requested for block #12, this proof will be different from the one for block #10.

Mirko-von-Leipzig commented 2 weeks ago

Cache the complete account proofs for all recently updated accounts.

I was thinking about it and there is an issue with other accounts. When one account was changed, its subtree has to be recalculated up to the root. And this affects proofs of all other accounts. For example, if account 1 was updated in block #10 and account 2 was updated in block #11 and proof of account 1 was requested for block #12, this proof will be different from the one for block #10.

Indeed you'll have to store the account proof per block for an account until it exceeds the limit.

polydez commented 2 weeks ago

Indeed you'll have to store the account proof per block for an account until it exceeds the limit.

My point was, that it's not enough to store proofs for only updated accounts, because proofs of unchanged accounts are also updated on any blockchain state change. In other words, update of one account affects proofs of all accounts.

Mirko-von-Leipzig commented 2 weeks ago

Indeed you'll have to store the account proof per block for an account until it exceeds the limit.

My point was, that it's not enough to store proofs for only updated accounts, because proofs of unchanged accounts are also updated on any blockchain state change. In other words, update of one account affects proofs of all accounts.

It does, but you only need to cache the recently updated accounts. For every account updated x in block y, you'll have to store the proof for x from block y..tip until tip - y > limit.

But in general all solutions are just different ways of expressing the same subgraph. My suggestion is just a de-normalized variation where you flatten the graph. i.e. least efficient ito memory usage but also low effort ito implementation.

polydez commented 1 week ago

Taking into account ideas from our call, we end up with the following solutions for the corresponding problems:

Getting Account State

Currently we store the latest account's state and all account state deltas from the very beginning of account's lifetime. We store deltas in a single account_deltas table in binary-encoded form.

Let's refactor deltas in order to make them reversible (i.e. store old values along with new ones). And we also should store deltas in set of separate tables, so that it will be possible to calculate single delta from block A to block B by just using SQL queries:

CREATE TABLE
    account_deltas
(
    account_id  INTEGER NOT NULL,
    block_num   INTEGER NOT NULL,
    old_nonce   INTEGER NOT NULL,
    new_nonce   INTEGER NOT NULL,

    PRIMARY KEY (account_id, block_num),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id),
    FOREIGN KEY (block_num) REFERENCES block_headers(block_num),
    CONSTRAINT account_deltas_nonce_increased CHECK (old_nonce < new_nonce)
) STRICT;

CREATE TABLE
    account_storage_delta_values
(
    account_id  INTEGER NOT NULL,
    block_num   INTEGER NOT NULL,
    slot        INTEGER NOT NULL,
    old_value   BLOB    NOT NULL,
    new_value   BLOB    NOT NULL,

    PRIMARY KEY (account_id, block_num, slot),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id),
    FOREIGN KEY (block_num) REFERENCES block_headers(block_num),
    CONSTRAINT account_storage_delta_value_updated CHECK (old_value != new_value)
) STRICT, WITHOUT ROWID;

CREATE TABLE
    account_storage_delta_map_values
(
    account_id  INTEGER NOT NULL,
    block_num   INTEGER NOT NULL,
    slot        INTEGER NOT NULL,
    key         BLOB    NOT NULL,
    old_value   BLOB    NOT NULL,
    new_value   BLOB    NOT NULL,

    PRIMARY KEY (account_id, block_num, slot, key),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id),
    FOREIGN KEY (block_num) REFERENCES block_headers(block_num),
    CONSTRAINT account_storage_delta_map_value_updated CHECK (old_value != new_value)
) STRICT;

CREATE TABLE
    account_fungible_asset_deltas
(
    account_id  INTEGER NOT NULL,
    block_num   INTEGER NOT NULL,
    faucet_id   INTEGER NOT NULL,
    delta       INTEGER NOT NULL,

    PRIMARY KEY (account_id, block_num, faucet_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id),
    FOREIGN KEY (block_num) REFERENCES block_headers(block_num)
) STRICT, WITHOUT ROWID;

CREATE TABLE
    account_non_fungible_asset_delta_actions
(
    account_id  INTEGER NOT NULL,
    block_num   INTEGER NOT NULL,
    asset       BLOB NOT NULL,
    is_remove   INTEGER NOT NULL, -- 0 - add, 1 - remove

    PRIMARY KEY (account_id, block_num, asset),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id),
    FOREIGN KEY (block_num) REFERENCES block_headers(block_num)
) STRICT, WITHOUT ROWID;

So, a procedure of getting account state for requested block number will look like:

Get the latest account state.
Compute reverse delta from the latest account state to the requested one by series of SQL queries (hope, it will take 5 SQL queries).
Apply reverse delta to the latest account state.

Getting Inclusion Proof

Currently we have in-memory sparse Merkle tree which stores the latest account hashes for each account in blockchain. It's updated each new block by computing update first and then applying it to the current tree. When we need to get account inclusion proof, we traverse tree from account's leaf up to the root, putting hash of each passed node to the list.

We can keep computed updates for the last $N$ blocks in order to be able to get inclusion proof for any account in any of last $N$ blocks. This will require significantly less space than keeping all copies of account SMTs for the last $N$ blocks. But we also will need to keep $(tip - N)$'th block state and update it with every computed update (another approach would be to keep SMT for every $Nth$ block and purge data from $tip - 2N$ to $tip - N$, but this would require about 2 times more memory).

So, an algorithm of getting account inclusion proof for requested block number will be:

Get update for the requested block.
Starting from the root of the update, go deeper and deeper to the searched account ID. If account's ID is in left subtree — go left, if in the right subtree — go right.
If there is no corresponding node in the current update — go to the previous update. Repeat, if the node is not in the update, up to the full SMT.
When account's leaf was found, put it to the end of the list and output the list in reverse order.

UPD: I think, if the solution performs well, we won't even need to keep latest state SMT, it will be enough to have "initial" ($tip - N$) state and all updates after that state.

IMO, this is the most memory/computations effective solution so far.

bobbinth commented 1 week ago

Get the latest account state.

Compute reverse delta from the latest account state to the requested one by series of SQL queries (hope, it will take 5 SQL queries).

Apply reverse delta to the latest account state.

I think this works. A couple of comments:

We may be able to batch all 5 queries into a single database request - though, not sure how complicated it is with SQLite. Also, in the future, we could probably move these tables into a separate DB altogether to reduce the load on the main DB.
I noticed that in some tables we keep both old value and new value. What's the purpose of the old value (since it should be usually present in the previous record for the same element)?

So, an algorithm of getting account inclusion proof for requested block number will be:

Get update for the requested block.

Starting from the root of the update, go deeper and deeper to the searched account ID. If account's ID is in left subtree — go left, if in the right subtree — go right.

If there is no corresponding node in the current update — go to the previous update. Repeat, if the node is not in the update, up to the full SMT.

When account's leaf was found, put it to the end of the list and output the list in reverse order.

I think this works. There could also be other ways to keep track of this data. For example, we could merge all changesets into a single data structure - something like:

BTreeMap<NodeIndex, Vec<(u32, Digest)>>

Where the internal tuples are (block_num, node). This may be more efficient from the memory access standpoint, but also probably more complicated to remove stale data.

Overall, I'd probably abstract this away behind a new struct - maybe something like this:

pub struct AccountTree {
    tree: Smt,
    updates: SmtUpdates,
}

I terms of PRs, I think we actually have 2 PRs here:

Update the account delta structure.
Update the account tree.

I'd probably start with the delta structure as we need it for other purposes too.

polydez commented 1 week ago

We may be able to batch all 5 queries into a single database request - though, not sure how complicated it is with SQLite. Also, in the future, we could probably move these tables into a separate DB altogether to reduce the load on the main DB.

We can join different tables in a single request, maybe UNION them, but I'm not sure it will improve performance. I will think deeper about it during implementation of merging of deltas.

I noticed that in some tables we keep both old value and new value. What's the purpose of the old value (since it should be usually present in the previous record for the same element)?

This is good idea and in our current solution it should work, but we should decide first, whether we keep all deltas from the very beginning, or prune deltas older than some number of blocks.

I think this works. There could also be other ways to keep track of this data. For example, we could merge all changesets into a single data structure - something like:
BTreeMap<NodeIndex, Vec<(u32, Digest)>>
Where the internal tuples are (block_num, node). This may be more efficient from the memory access standpoint, but also probably more complicated to remove stale data.

Thank you! I totally agree, this will require more operations for pruning old data, especially moving data in vectors. I will think about some optimizations here.

0xPolygonMiden / miden-node