Closed bh2smith closed 4 years ago
A proposal for storage of account data (assuming 2^24 accounts and 2^8 tokens) would be in a mongo collection with either of the two following formats:
accounts = {
state_index: {
(account_id, token_id): balance
}
}
So for example using integer to represent stateHash/Index
{
0: {}, # Says that all balances are currently 0
1: {
(0, 0): 1, # Account 0 deposited 1 of token 0
(1, 1): 10, # Account 1 deposited 10 of token 1
(1, 2): 5, # Account 1 deposited 5 of token 2
},
2: {
(0, 0): 0,
(0, 1): 5, # This suggests account 0 traded 1 of token 0 for 5 of token 1
(1, 0): 1
(1, 1): 5,
# Can infer that no change was made to entry (1, 2) which is still 5.
}
}
Look up would be O(NumStateTransitions) if stateIndex
was used instead of stateHash
.
accounts = {
(account_id, token_id): {
state_index: balance,
current: balance OR state_index_last_changed
}
}
For example, representing the same information a
{
(0, 0): {1: 1, 2: 0, "curr": 0},
(0, 1): {2: 5, "curr": 5},
(1, 0): {2: 1, "curr": 1}
(1, 1): {1: 10, 2: 5, "curr": 5},
(1, 2): {1: 5, "curr": 5}
}
Lookup here is O(1)
Imagine now that state transition from 2 -> 3 is the result of applying a batch of withdrawals where Account 1 withdraws all of their tokens.
{
(0, 0): {1: 1, 2: 0, "curr": 0},
(0, 1): {2: 5, "curr": 5},
(1, 0): {2: 1, 3: 0, "curr": 0}
(1, 1): {1: 10, 2: 5, 3: 0, "curr": 0},
(1, 2): {1: 5, 3: 0, "curr": 0}
}
Note that neither of these proposals require continual copying of values for each transition.
Note also that truncation/pruning of Option 2 is slightly more complicated than Option 1.
Subscribing @fleupold and @josojo!
Unfortunately, after some thought, we won't be able to store the information as proposed above (since mongodb
isn't a key-value store). The best we can do will look something like this,
balance_archive = {
"account_id": int,
"token_id": int,
"state_index": int,
"value": int
}
So, for example, the second scenario from option 2 above would be described as:
balance_archive = {
{
"account_id": 0,
"token_id": 0,
"state_index": 1,
"value": 1
},
{
"account_id": 0,
"token_id": 0,
"state_index": 2,
"value": 0
},
{
"account_id": 0,
"token_id": 1,
"state_index": 2,
"value": 5
},
{
"account_id": 1,
"token_id": 0,
"state_index": 2,
"value": 1
},
{
"account_id": 1,
"token_id": 0,
"state_index": 3,
"value": 0
},
{
"account_id": 1,
"token_id": 1,
"state_index": 1,
"value": 10
},
{
"account_id": 1,
"token_id": 1,
"state_index": 2,
"value": 5
},
{
"account_id": 1,
"token_id": 1,
"state_index": 3,
"value": 0
},
{
"account_id": 1,
"token_id": 2,
"state_index": 1,
"value": 5
},
{
"account_id": 1,
"token_id": 2,
"state_index": 3,
"value": 0
},
}
In the following discussion we will investigate pro and cons of the following three variations of Merkle-Type Data Structures
Naive: A complete binary tree with account states as leaf data.
-
Storage is unnecessarily redundant.Clipped: A pruned version of the tree whose empty initialization costs as much as hashing and remembering each of the empty naive root hashes for trees of height 0 <= i <= N (= 24)
. Let us denot the empty tree of height i
by E_i
. At the time of writing, this seems to be the most appropriate structure to use.
~
Somewhat improved storage (removing redundancy from naive)+
Account registration wouldn't require insertion!+
Contract initialization requires deterministic, reasonable computation.Sparse: Achieved by shifting the non-trivial leaves up towards the root, provided they do no interfere with the occupied namespace. We will see that such a data structure will have issues with rehashing on insertion (which affects the way account registration is currently implemented).
+
Significantly improved storage-
Account registration requires insert (i.e. must be processed as state transition)+
Contract initialization requires no tree construction.A very concise and good description of each of these is provided in the Libra White paper here. The following image (also taken from Libra) gives a good pictorial description of each type
For the purpose of this discussion our analysis is primarily focused on the account state transition resulting from auction settlement, since the operations required to update the account state for deposits and withdrawals are analogous (and less complex) than that of auction settlement. See the brief paragraph describing their essential equivalence below in Processing Deposits & Withdrawals
The initial account state root, based on the parameters (such as numTokens
, numAccounts
) emitted by the contract can be set immediately as the root of E_N
. On the database side (at least for the naive and clipped tree structures), contact initialization would amount to computing and storing the following empty tree data:
empty_trees = {i: E_i for i in range(0, N + 1)}
where E_0
is the empty tokenTree
(A Merkle tree of height 8 storing balances for each token in the leaves).
In the Sparse Merkle Tree implementation, we could set the initial state to E_0
or whatever. In fact, it seems like we might want to consider the sparse tree as being dynamically allocated storage.
Account registration is equivalent to "insertion" of a new (empty) tokenTree
into the leaf of our account state tree structure.
In the Naive Merkle tree all the empty leave would have already been initialized (i.e. already there) and this insertion need not be performed. Similarly, in the clipped structure, these stubs
would already represent the hash of the empty accounts that were already there and no insertion would need to be performed.
In a truly Sparse Merkle Tree, account registration would imply some newly shared branch in the "trie" path and, more often than not, require rearrangement of other accounts lower into the tree's depth. That is to say, account registration would necessarily change the root hash of the current state. Since the smart contract is incapable (due to lack of data) of updating it's own state root, this type of tree like structure would not be possible unless registration requests were collected in a batch and processed via state transition.
There are three types of account transition: Deposits, Withdrawals and Auction Settlements which essentially amount to increment(+), decrement(-) and mixed(+/-) updates respectively. Thus, working through the flow of database interactions and computations involved in auction settlement should cover most complexities involved. Generically speaking any state transition would consist of the following steps:
Contract emits transition request event.
Query database for request information (query by batchId
)
Query database for (relevant) account information:
AccountId
batchSize
Update account balances:
batchSize
and treeHeight
Note that the leaves of the account state tree are tokenTrees
so hashing involves:
hash tree of height 8 for at most batchSize
token trees
hashing tree of height 24 for at most batchSize
accounts.
It is also important to note that batchSize
(~ 2^10) is significantly less that than 2^24, but significantly larger than numTokens
(~ 2^5). Thus, it will be of importance to analyze the max number of hashes necessary for the entire spectrum.
TODO - analyze max num hashes. A rough estimate (without optimization) is 24 * 550 (~ 13K) which is the tree height times the maximum number of distinct accounts to be updated in a single auction.
In terms of physical storage and data structures which can be appropriately queried, consider the following proposal:
Current Account state must be accessible for any state transition to take place. This would be the Merkle tree of choice (hopefully the clipped one) which is continually updated on transition (not duplicated). In the (unlikely) event of a dispute and rollback this current state could be unwound by applying the inverse of a string of archived differences (i.e. deltas
). These deltas would be small enough that they could be stored as a list (since the number of relevant accounts is not more than 1000).
deltas = {
state_ids: {
account_ids: {
token_ids: int
}
}
}
For example (using the examples as in the previous discussion) we can demonstrate the deltas of each transition type.
deltas = {
state_1: { # Deposit event (since all balances were incremented)
account_0: {
token_0: 1
},
account_1: {
token_1: 10,
token_2: 5
}
},
state_2: { # Auction Settlement (since all balances were both + and -)
account_0: {
token_0: -1,
token_1: 6
},
account_1: {
token_0: 1,
token_1: -6,
} # Notice the sum of values over each token must be 0
},
state_3: { # Withdraw event (since all balances were decremented)
account_1: {
token_0: -1,
token_1: -4
token_2: -5
}
},
}
Note that the number of balance updates necessary in a given auction is bounded above by 2 * numOrders
(= 2000). That is, for each fulfilled order there is, at most, one subtracted and one increased balance in each tokenTree (i.e. the tree representing all token balances for an individual account).
With an account space of size 2^24 (16M) it is clear that a naive Merkle Tree (i.e. a complete binary tree with account data as leaf nodes) would be much too large and redundant for our storage needs. An alternate data structure, commonly referred to as a Sparse Merkle Tree which compresses the storage by first "clipping" out any irrelevant/empty leaves followed by shifting the non-empty leaves up to a height of "non-interference"
The most relevant state transition on this exchange is that of Auction Settlement
. In essence, this process is executed as follows;
Auction information, such as execution amounts, are stripped from AuctionSettlement
event emitted from by the Smart contract.
Orders are fetched from the DB (query by auctionId
).
Relevant Accounts are fetched from DB for current balance (i.e. those pertaining to the orders)
Note that order information is required not only for the aggregation of relevant accountIds, but also so that execution amounts (in the encoded solution data) can be linked to their rightful account owner and token-pair (via slot_index
).
Balances (of relevant accounts) are then updated according to the execution amounts contained in the auction results.
In pseudo-code that is;
orders = db.orders.find({
"auctionId": AuctionSettlement.auction_id
}) # (len < 1000)
relevant_accounts = db.accounts.find({
"accountId": orders.mapped(x => x.account_id)
}) # (len < 550)
execution_info = zip(
AuctionSettlement.prices_and_volumes.mapped(buy_amount, sell_amounts),
orders.mapped(buy_token, sell_token)
)
This implies account information (i.e. the tokenTree
) should be easily accessible via query by ID [Easily achieve with any form of Merkle Tree]
First note that processing of deposits/withdrawals amounts to a series of account increments/decrements (+/-) respectively. With B
representing the batch size, as in the analysis of auction settlement, the number of updated accounts in a single transition is bounded above by B
. This implies that the only technical difference between the account updates for deposits/withdrawals and auction settlement is that auction settlements always simultaneously have increments and decrements in balances.
Really nice write-up. I am also in favour of the clipped implementation. The storage improvements of the sparse one should not be worth the hassle.
Hey @josojo, I am glad to see your reply. I have been thinking a bit more about the clipped vs. sparse and came up with the following argument which is somewhat more in favour of the full compression.
It is clear that the fully compressed "Sparse" implementation would force us to process account registrations in a batches (as state transitions). At the moment it seems like this would be additional work to refactor the contract code for these purposes, but in the long term, such a refactoring might be more beneficial. Not yet sure exactly how to evaluate the long term cost-benefit of such a refactoring, but it might be worth considering if it it worth it.
I implemented a POC of an approach that is somewhat mixing the Libra tree representation with the approach of TurboGeth. Let me introduce the two related work items first and then describe how my implementation is different.
Libra is representing its state in a binary Sparse Merkle Trees as described by @bh2smith above. Note that, e.g. Ethereum does not use a binary tree (each node has up to 16 children => radix 16).
For simplicity, let's look at the representation of a Merkle Tree that is completely filled (since sparseness is just an optimisation that I'll discuss below). On disk, data is stored as key-value pairs in RocksDB. The key represents the hash of the node thats is currently stored. The value is an array which represents the 4-level-subtree (4 address bits) of the current node. Each of the 30 entries (16 + 8 +4 + 2) in the value array represents a node (hash) in that 4 level subtree. Only the widest level in each record (e.g. first 16 entries) contain values that are also keys to another db record. Despite storing many values in one record the hash function is always only applied on two items (binary tree)
Represented visually:
Or as database records:
{
root_hash: [level_4_0_hash, level_4_1_hash, ... level_4_15_hash, h(level_4_0_hash, level_4_1_hash), ..., [up until level 1]]
level_4_0_hash: [level_8_0_hash, level_8_1_hash, ... level_8_15_hash, h(level_8_0_hash, level_8_1_hash), ..., [up until level 5]]
level_4_1_hash: [level_8_16_hash, level_8_17_hash, ... level_8_31_hash, h(level_8_16_hash, level_8_17_hash), ..., [up until level 5]]
...
level_4_15_hash: ...
level_8_0_hash: ...
...
level_8_255_hash: ...
...
}
One of the main drawbacks in both Ethereum Patricia Merkle Tree representation and Libra's representation is that we can only access values by hash. This means if we want to lookup a value by address, we actually need to traverse the entire tree, leading in log(n) queries (one for each level). Moreover, these queries are hard to batch into one db roundtrip, since the key of the next level depends on the lookup of the current one.
TurboGeth tries to overcome this drawback by indexing records by address rather than hash.
This allows efficient lookup of the entire Merkle path by querying for all address prefixes in a single query. Lookup by hash is no longer possible efficiently, but at this point I can't foresee a query that would need this. A lookup of a single value by address (without Merkle authentication proof) can be achieved with a single lookup.
Taking the database representation above and adjusting the notion of keys to use addresses instead of hash values gives:
{
"": [level_4_0_hash, level_4_1_hash, ... level_4_15_hash, h(level_4_0_hash, level_4_1_hash), ..., [up until level 1]]
"0": [level_8_0_hash, level_8_1_hash, ... level_8_15_hash, h(level_8_0_hash, level_8_1_hash), ..., [up until level 5]]
"1": [level_8_16_hash, level_8_17_hash, ... level_8_31_hash, h(level_8_16_hash, level_8_17_hash), ..., [up until level 5]]
...
"F": ...
"00": ...
...
"FF": ...
}
The implementation proposed in https://github.com/gnosis/dex-services/tree/tree_benchmark implements a Libra like layout for configurable tree height
.
The data layout combines subtrees of configurable length (bits_per_record
) into a single record.
Records are keyed by their parents address prefix (e.g. assuming 4 bit subtrees, node 0xAF96
can be found in record with key 0xAF9
)
I think it's crucial to have a radix of 2 (binary tree vs. multi-child tree), since the Merkle path data required to reconstruct the Merkle root, e.g. in an on-chain challenge, grows linearly with the radix. A radix of 2 will thus allow relatively concise on-chain verification of root transitions.
My implementation doesn't use sparse Merkle trees, since this would require special treatment when trying to verify a Merkle proof (clipped nodes skip a bunch of levels). Since verification will be done in the Smart Contract, it should have rather simple logic. For 24 bits the number of "wasted" storage is not as sever as for trees with 256 bits.
In order to not store any useless records (e.g. in an empty tree), I assume that the absence of a record means it is filled with 0 hashes at the specified level.
To achieve this we only need to precompute height
levels of zero hashes beforehand and replace missing db records with a "fresh" record containing the zero-hashes of the levels that the record is supposed to contain.
As soon as one node inside the record is not a zero-hash it will be written to storage.
I ran Criterion benchmarks on 2a83bc3 with trees of height 24 (only accounts) & 32 (accounts + tokens in same tree) and different subtree sizes (1, 2, 4, 8). These are the results visualized:
As we can see we should be able to process 1000 updates within 150ms on average (5050 iterations).
Note, that the performance can likely be significantly improved if we introduce an in-memory cache to lookup pages that are frequently queried within the uptime of the service.
I like about Felix implementation:
To keep it as simple as possible, I would probably keep the subtree size to 1. But I also feel that higher subtrees do just add a little complexity. Hence I am fine with any subtree size. This is also something that can be tuned later.
The write-ups of you guys are really excellent. Great! Easy to understand ( anyways I needed several read-throughs due to new vocabulary).
I am not sure I understand to what extent
it keeps the snarks as simple as before
I wasn't aware that there was ever a change to the snarks. I guess the only thing that may have changed them would be the fully compressed Sparse Merkle Tree which would have had more problems than just different snarks.
Account space will grow from 1000 (i.e.
uint16
) to about 16 Millionuint24
as seen inThis will require alternate storage means for the account balance array in the database (since storing an array of size
maxTokens * maxAccounts
~ 30 * 16M = 480M is no longer feasible).