gnosis / dex-services

Off-chain services for the Gnosis Protocol v1.
33 stars 9 forks source link

Increase Account Capacity #137

Closed bh2smith closed 4 years ago

bh2smith commented 5 years ago

Account space will grow from 1000 (i.e. uint16) to about 16 Million uint24 as seen in

This will require alternate storage means for the account balance array in the database (since storing an array of size maxTokens * maxAccounts ~ 30 * 16M = 480M is no longer feasible).

bh2smith commented 5 years ago

A proposal for storage of account data (assuming 2^24 accounts and 2^8 tokens) would be in a mongo collection with either of the two following formats:

Option 1

accounts = {
    state_index: {
        (account_id, token_id): balance
    }
}

So for example using integer to represent stateHash/Index

{
    0: {},          # Says that all balances are currently 0
    1: { 
        (0, 0): 1,  # Account 0 deposited 1 of token 0
        (1, 1): 10, # Account 1 deposited 10 of token 1
        (1, 2): 5,  # Account 1 deposited 5 of token 2
    },
    2: {
        (0, 0): 0,
        (0, 1): 5,  # This suggests account 0 traded 1 of token 0 for 5 of token 1
        (1, 0): 1 
        (1, 1): 5,  
        # Can infer that no change was made to entry (1, 2) which is still 5.
    }
}

Look up would be O(NumStateTransitions) if stateIndex was used instead of stateHash.

Option 2

accounts = {
    (account_id, token_id): {
        state_index: balance,
        current:        balance OR state_index_last_changed
    }
}

For example, representing the same information a

{
    (0, 0): {1: 1, 2: 0, "curr": 0},
    (0, 1): {2: 5, "curr": 5},
    (1, 0): {2: 1, "curr": 1}
    (1, 1): {1: 10, 2: 5, "curr": 5},
    (1, 2): {1: 5, "curr": 5}
}

Lookup here is O(1)

Imagine now that state transition from 2 -> 3 is the result of applying a batch of withdrawals where Account 1 withdraws all of their tokens.

{
    (0, 0): {1: 1, 2: 0, "curr": 0},
    (0, 1): {2: 5, "curr": 5},
    (1, 0): {2: 1, 3: 0, "curr": 0}
    (1, 1): {1: 10, 2: 5, 3: 0, "curr": 0},
    (1, 2): {1: 5, 3: 0, "curr": 0}
}

Note that neither of these proposals require continual copying of values for each transition.

Note also that truncation/pruning of Option 2 is slightly more complicated than Option 1.

Subscribing @fleupold and @josojo!

bh2smith commented 5 years ago

Unfortunately, after some thought, we won't be able to store the information as proposed above (since mongodb isn't a key-value store). The best we can do will look something like this,

balance_archive = {
    "account_id": int,
    "token_id": int,
    "state_index": int,
    "value": int    
}

So, for example, the second scenario from option 2 above would be described as:

balance_archive = {
    {
        "account_id": 0,
        "token_id": 0,
        "state_index": 1,
        "value": 1
    },
    {
        "account_id": 0,
        "token_id": 0,
        "state_index": 2,
        "value": 0
    },
    {
        "account_id": 0,
        "token_id": 1,
        "state_index": 2,
        "value": 5
    },
    {
        "account_id": 1,
        "token_id": 0,
        "state_index": 2,
        "value": 1
    },
    {
        "account_id": 1,
        "token_id": 0,
        "state_index": 3,
        "value": 0
    },
    {
        "account_id": 1,
        "token_id": 1,
        "state_index": 1,
        "value": 10
    },
    {
        "account_id": 1,
        "token_id": 1,
        "state_index": 2,
        "value": 5
    },
    {
        "account_id": 1,
        "token_id": 1,
        "state_index": 3,
        "value": 0
    },
    {
        "account_id": 1,
        "token_id": 2,
        "state_index": 1,
        "value": 5
    },
    {
        "account_id": 1,
        "token_id": 2,
        "state_index": 3,
        "value": 0
    },
}
bh2smith commented 5 years ago

In the following discussion we will investigate pro and cons of the following three variations of Merkle-Type Data Structures

  1. Naive: A complete binary tree with account states as leaf data.

    • - Storage is unnecessarily redundant.
  2. Clipped: A pruned version of the tree whose empty initialization costs as much as hashing and remembering each of the empty naive root hashes for trees of height 0 <= i <= N (= 24). Let us denot the empty tree of height i by E_i. At the time of writing, this seems to be the most appropriate structure to use.

    • ~ Somewhat improved storage (removing redundancy from naive)
    • + Account registration wouldn't require insertion!
    • + Contract initialization requires deterministic, reasonable computation.
  3. Sparse: Achieved by shifting the non-trivial leaves up towards the root, provided they do no interfere with the occupied namespace. We will see that such a data structure will have issues with rehashing on insertion (which affects the way account registration is currently implemented).

    • + Significantly improved storage
    • - Account registration requires insert (i.e. must be processed as state transition)
    • + Contract initialization requires no tree construction.

A very concise and good description of each of these is provided in the Libra White paper here. The following image (also taken from Libra) gives a good pictorial description of each type

Screenshot 2019-06-18 at 3 01 24 PM

For the purpose of this discussion our analysis is primarily focused on the account state transition resulting from auction settlement, since the operations required to update the account state for deposits and withdrawals are analogous (and less complex) than that of auction settlement. See the brief paragraph describing their essential equivalence below in Processing Deposits & Withdrawals

Simple Operations

Contract Initialization

The initial account state root, based on the parameters (such as numTokens, numAccounts) emitted by the contract can be set immediately as the root of E_N. On the database side (at least for the naive and clipped tree structures), contact initialization would amount to computing and storing the following empty tree data:

empty_trees = {i: E_i for i in range(0, N + 1)}

where E_0 is the empty tokenTree (A Merkle tree of height 8 storing balances for each token in the leaves).

In the Sparse Merkle Tree implementation, we could set the initial state to E_0 or whatever. In fact, it seems like we might want to consider the sparse tree as being dynamically allocated storage.

Account Registration

Account registration is equivalent to "insertion" of a new (empty) tokenTree into the leaf of our account state tree structure.

In the Naive Merkle tree all the empty leave would have already been initialized (i.e. already there) and this insertion need not be performed. Similarly, in the clipped structure, these stubs would already represent the hash of the empty accounts that were already there and no insertion would need to be performed.

In a truly Sparse Merkle Tree, account registration would imply some newly shared branch in the "trie" path and, more often than not, require rearrangement of other accounts lower into the tree's depth. That is to say, account registration would necessarily change the root hash of the current state. Since the smart contract is incapable (due to lack of data) of updating it's own state root, this type of tree like structure would not be possible unless registration requests were collected in a batch and processed via state transition.

Generic State Transition

There are three types of account transition: Deposits, Withdrawals and Auction Settlements which essentially amount to increment(+), decrement(-) and mixed(+/-) updates respectively. Thus, working through the flow of database interactions and computations involved in auction settlement should cover most complexities involved. Generically speaking any state transition would consist of the following steps:

Note that the leaves of the account state tree are tokenTrees so hashing involves:

It is also important to note that batchSize (~ 2^10) is significantly less that than 2^24, but significantly larger than numTokens (~ 2^5). Thus, it will be of importance to analyze the max number of hashes necessary for the entire spectrum.

TODO - analyze max num hashes. A rough estimate (without optimization) is 24 * 550 (~ 13K) which is the tree height times the maximum number of distinct accounts to be updated in a single auction.

In terms of physical storage and data structures which can be appropriately queried, consider the following proposal:

Current Account state must be accessible for any state transition to take place. This would be the Merkle tree of choice (hopefully the clipped one) which is continually updated on transition (not duplicated). In the (unlikely) event of a dispute and rollback this current state could be unwound by applying the inverse of a string of archived differences (i.e. deltas). These deltas would be small enough that they could be stored as a list (since the number of relevant accounts is not more than 1000).

deltas = {
    state_ids: {
    account_ids: { 
        token_ids: int
        }
    }
}

For example (using the examples as in the previous discussion) we can demonstrate the deltas of each transition type.

deltas = {
    state_1: {  # Deposit event (since all balances were incremented)
        account_0: {
            token_0: 1
        },
        account_1: {
            token_1: 10,
            token_2: 5
        }
    },
    state_2: {  # Auction Settlement (since all balances were both + and -)
        account_0: {
            token_0: -1,
            token_1: 6
        },
        account_1: {
            token_0: 1,
            token_1: -6,
        }  # Notice the sum of values over each token must be 0
    },
    state_3: {  # Withdraw event (since all balances were decremented)
        account_1: {
            token_0: -1,
            token_1: -4
            token_2: -5
        }
    },
}

Auction Settlement

Note that the number of balance updates necessary in a given auction is bounded above by 2 * numOrders (= 2000). That is, for each fulfilled order there is, at most, one subtracted and one increased balance in each tokenTree (i.e. the tree representing all token balances for an individual account).

With an account space of size 2^24 (16M) it is clear that a naive Merkle Tree (i.e. a complete binary tree with account data as leaf nodes) would be much too large and redundant for our storage needs. An alternate data structure, commonly referred to as a Sparse Merkle Tree which compresses the storage by first "clipping" out any irrelevant/empty leaves followed by shifting the non-empty leaves up to a height of "non-interference"

The most relevant state transition on this exchange is that of Auction Settlement. In essence, this process is executed as follows;

Note that order information is required not only for the aggregation of relevant accountIds, but also so that execution amounts (in the encoded solution data) can be linked to their rightful account owner and token-pair (via slot_index).

Balances (of relevant accounts) are then updated according to the execution amounts contained in the auction results.

In pseudo-code that is;

orders = db.orders.find({
    "auctionId": AuctionSettlement.auction_id
})  # (len < 1000)

relevant_accounts = db.accounts.find({
    "accountId": orders.mapped(x => x.account_id)
})  # (len < 550)

execution_info = zip(
    AuctionSettlement.prices_and_volumes.mapped(buy_amount, sell_amounts), 
    orders.mapped(buy_token, sell_token)
)

This implies account information (i.e. the tokenTree) should be easily accessible via query by ID [Easily achieve with any form of Merkle Tree]

Processing Deposits \& Withdrawals

First note that processing of deposits/withdrawals amounts to a series of account increments/decrements (+/-) respectively. With B representing the batch size, as in the analysis of auction settlement, the number of updated accounts in a single transition is bounded above by B. This implies that the only technical difference between the account updates for deposits/withdrawals and auction settlement is that auction settlements always simultaneously have increments and decrements in balances.

josojo commented 5 years ago

Really nice write-up. I am also in favour of the clipped implementation. The storage improvements of the sparse one should not be worth the hassle.

bh2smith commented 5 years ago

Hey @josojo, I am glad to see your reply. I have been thinking a bit more about the clipped vs. sparse and came up with the following argument which is somewhat more in favour of the full compression.

It is clear that the fully compressed "Sparse" implementation would force us to process account registrations in a batches (as state transitions). At the moment it seems like this would be additional work to refactor the contract code for these purposes, but in the long term, such a refactoring might be more beneficial. Not yet sure exactly how to evaluate the long term cost-benefit of such a refactoring, but it might be worth considering if it it worth it.

fleupold commented 5 years ago

I implemented a POC of an approach that is somewhat mixing the Libra tree representation with the approach of TurboGeth. Let me introduce the two related work items first and then describe how my implementation is different.

Libra

Libra is representing its state in a binary Sparse Merkle Trees as described by @bh2smith above. Note that, e.g. Ethereum does not use a binary tree (each node has up to 16 children => radix 16).

For simplicity, let's look at the representation of a Merkle Tree that is completely filled (since sparseness is just an optimisation that I'll discuss below). On disk, data is stored as key-value pairs in RocksDB. The key represents the hash of the node thats is currently stored. The value is an array which represents the 4-level-subtree (4 address bits) of the current node. Each of the 30 entries (16 + 8 +4 + 2) in the value array represents a node (hash) in that 4 level subtree. Only the widest level in each record (e.g. first 16 entries) contain values that are also keys to another db record. Despite storing many values in one record the hash function is always only applied on two items (binary tree)

Represented visually: Libra Storage

Or as database records:

{
    root_hash: [level_4_0_hash, level_4_1_hash, ... level_4_15_hash, h(level_4_0_hash, level_4_1_hash), ..., [up until level 1]]
    level_4_0_hash: [level_8_0_hash, level_8_1_hash, ... level_8_15_hash, h(level_8_0_hash, level_8_1_hash), ..., [up until level 5]]
    level_4_1_hash: [level_8_16_hash, level_8_17_hash, ... level_8_31_hash, h(level_8_16_hash, level_8_17_hash), ..., [up until level 5]]
    ...
    level_4_15_hash: ...
    level_8_0_hash: ...
    ...
    level_8_255_hash: ...
    ...
}

TurboGeth

DevCon Talk

One of the main drawbacks in both Ethereum Patricia Merkle Tree representation and Libra's representation is that we can only access values by hash. This means if we want to lookup a value by address, we actually need to traverse the entire tree, leading in log(n) queries (one for each level). Moreover, these queries are hard to batch into one db roundtrip, since the key of the next level depends on the lookup of the current one.

TurboGeth tries to overcome this drawback by indexing records by address rather than hash.

This allows efficient lookup of the entire Merkle path by querying for all address prefixes in a single query. Lookup by hash is no longer possible efficiently, but at this point I can't foresee a query that would need this. A lookup of a single value by address (without Merkle authentication proof) can be achieved with a single lookup.

Taking the database representation above and adjusting the notion of keys to use addresses instead of hash values gives:

{
    "": [level_4_0_hash, level_4_1_hash, ... level_4_15_hash, h(level_4_0_hash, level_4_1_hash), ..., [up until level 1]]
    "0": [level_8_0_hash, level_8_1_hash, ... level_8_15_hash, h(level_8_0_hash, level_8_1_hash), ..., [up until level 5]]
    "1": [level_8_16_hash, level_8_17_hash, ... level_8_31_hash, h(level_8_16_hash, level_8_17_hash), ..., [up until level 5]]
    ...
    "F": ...
    "00": ...
    ...
    "FF": ...
}

Our implementation

The implementation proposed in https://github.com/gnosis/dex-services/tree/tree_benchmark implements a Libra like layout for configurable tree height. The data layout combines subtrees of configurable length (bits_per_record) into a single record. Records are keyed by their parents address prefix (e.g. assuming 4 bit subtrees, node 0xAF96 can be found in record with key 0xAF9)

I think it's crucial to have a radix of 2 (binary tree vs. multi-child tree), since the Merkle path data required to reconstruct the Merkle root, e.g. in an on-chain challenge, grows linearly with the radix. A radix of 2 will thus allow relatively concise on-chain verification of root transitions.

My implementation doesn't use sparse Merkle trees, since this would require special treatment when trying to verify a Merkle proof (clipped nodes skip a bunch of levels). Since verification will be done in the Smart Contract, it should have rather simple logic. For 24 bits the number of "wasted" storage is not as sever as for trees with 256 bits.

In order to not store any useless records (e.g. in an empty tree), I assume that the absence of a record means it is filled with 0 hashes at the specified level. To achieve this we only need to precompute height levels of zero hashes beforehand and replace missing db records with a "fresh" record containing the zero-hashes of the levels that the record is supposed to contain. As soon as one node inside the record is not a zero-hash it will be written to storage.

Benchmark results

I ran Criterion benchmarks on 2a83bc3 with trees of height 24 (only accounts) & 32 (accounts + tokens in same tree) and different subtree sizes (1, 2, 4, 8). These are the results visualized:

chart

As we can see we should be able to process 1000 updates within 150ms on average (5050 iterations).

Note, that the performance can likely be significantly improved if we introduce an in-memory cache to lookup pages that are frequently queried within the uptime of the service.

josojo commented 5 years ago

I like about Felix implementation:

To keep it as simple as possible, I would probably keep the subtree size to 1. But I also feel that higher subtrees do just add a little complexity. Hence I am fine with any subtree size. This is also something that can be tuned later.

The write-ups of you guys are really excellent. Great! Easy to understand ( anyways I needed several read-throughs due to new vocabulary).

bh2smith commented 5 years ago

I am not sure I understand to what extent

it keeps the snarks as simple as before

I wasn't aware that there was ever a change to the snarks. I guess the only thing that may have changed them would be the fully compressed Sparse Merkle Tree which would have had more problems than just different snarks.