Open alnoki opened 1 year ago
/// A storage-efficient sorted map based on a B+ tree.
///
/// Inner Node IDs unset at bit 63, leaf node IDs set at bit 63.
struct BPlusMap<V> has store {
/// Maximum number of children, $m$, to an inner node.
inner_node_order: u8,
/// Maximum number of key-value pairs, $l$, in a leaf node.
leaf_node_order: u8,
/// `NIL` if empty, otherwise node ID of root node, which may be
/// an inner node or a leaf node.
root: u64,
/// Minimum key in map, ignored if empty.
min_key: u128,
/// Maximum key in map, ignored if empty.
max_key: u128,
/// Minimum key's leaf node ID, `NIL` if empty. The head of the
/// doubly linked list of leaf nodes.
head_node_id: u64,
/// Maximum key's leaf node ID, `NIL` if empty. The tail of the
/// doubly linked list of leaf nodes.
tail_node_id: u64,
/// Map from inner node ID to inner node.
inner_nodes: TableWithLength<u64, InnerNode>,
/// Map from leaf node ID to leaf node.
leaf_nodes: TableWithLength<u64, LeafNode<V>,
/// Inner node ID at top of inactive stack. `NIL` if no inactive
/// nodes.
inner_stack_top: u64,
/// Leaf node ID at top of inactive stack. `NIL` if no inactive
/// nodes.
leaf_stack_top: u64
}
/// Inactive if no children, else active. Inactive inner nodes are
/// stored in a stack.
struct InnerNode has store {
/// | Node status | Meaning if `NIL` | Meaning if not `NIL` |
/// |-------------|--------------------|----------------------|
/// | Active | Node is root | Parent inner node ID |
/// | Inactive | Is bottom of stack | Next node in stack |
parent: u64,
/// Inactive if empty, else an active node.
children: vector<Child>
}
/// Indicates a child to an inner node, either another inner node or
/// a leaf node.
struct Child has store {
/// The minimum key allowed in the subtree with the given node
/// ID. Ignored if the leftmost child of an inner node.
min_key: u128,
/// Node ID of corresponding child.
node_id: u64
}
/// Inactive if no pairs, else active. Active leaf nodes are stored
/// in a sorted doubly linked list (DLL), and inactive leaf nodes
/// are stored in a stack.
struct LeafNode<V> has store {
/// | Node status | Meaning if `NIL` | Meaning if not `NIL` |
/// |-------------|------------------|----------------------|
/// | Active | Node is root | Parent inner node ID |
/// | Inactive | Ignored | Ignored |
parent: u64,
/// | Node status | Meaning if `NIL` | Meaning if not `NIL` |
/// |-------------|------------------|----------------------|
/// | Active | Is DLL list head | Previous DLL node |
/// | Inactive | Ignored | Ignored |
previous: u64,
/// | Node status | Meaning if `NIL` | Meaning if not `NIL` |
/// |-------------|--------------------|----------------------|
/// | Active | Is DLL list tail | Next DLL node |
/// | Inactive | Is bottom of stack | Next node in stack |
next: u64,
/// Key-value pairs sorted by key.
pairs: vector<Pair<V>>
}
/// A key-value pair.
struct Pair<V> has store {
/// Inserted key.
key: u128,
/// Inserted and optionally mutated value.
value: V
}
Unlike Aptos' SimpleMap
, which relies on the computationally-intensive aptos_framework::comparator
module, the above implementation relies only on the primitive integer comparators <
, >
, <=
, >=
, !=
, ==
, etc.
This approach minimizes external function calls, but disallows for generality of the key type:
ideally the data structure would support any kind of integer key, but there is presently no Move support for a generic integer type.
Hence the most general u128
is chosen, with the expectation of upcasting for u8
or u64
keys.
It is proposed that public APIs mimic those of Rust's BTreeMap.
Ideally simple helper functions of the form is_leaf_node_id()
and to_leaf_node_id()
would support function inlining to reduce function call gas costs.
If inlining is not supported, however, functions can still be written as if they were to be inlined by the compiler, then manually inlined later.
Ideally the Aptos Framework would provide getters for the number of free bytes per global storage operation.
Here, public functions for the BPlusMap
could accept an immutable reference to a V
type, and calculate the corresponding optimal $l$ value accordingly.
As for $m$, this may hinge on gas optimizations, but could potentially be derived from the free bytes amount if testing reveals that the gas cost bottleneck is how compact inner nodes are.
See Beingessner 2022 for more optimality discussions and a comparisons with Rust's BTreeMap.
Sibling pointers may additionally be helpful for inner nodes, due to sibling adoption operations.
Note that Beingessner 2022 describes a vector of edges as well as a vector of keys, a stack for tacking vector indices for each level into the tree (e.g. going back up a level to update a child pointer), and tracks the height of the tree as well as the length (number of records).
It may be useful to include previous/next pointers in both inner and leaf nodes, for tracking the previous/next node in the doubly linked list at a given level. This could make for less costly iteration if spillover/adoption algorithms are implemented per a B*+ tree.
It is proposed that the structure issue an Iterator
struct containing a leaf node index and a vector index within, to ease the iteration process by reducing search/borrow costs. Here, the Iterator
shall contain fields that essentially function as pointers into the B+ map memory. As such, it is suggested that the B+ map structure include an iterator_issued: bool
flag that prohibits mutations/further iterator issuance when marked true. Per this approach, there shall be no mismatch between publicly exposed indices and the intended internal targets, since operations shall be read-only when an iterator is issued:
module econia::b_plus_map {
struct BPlusMap has store {
iterator_issued: bool,
// More fields...
}
struct Iterator {
leaf_node_index: u64,
vector_index: u64
}
}
For simplicity it was considered that iteration should rely on the borrow checker to prevent mutations, but at present it is unclear if this is possible: an initial hypothetical approach involved having iterator
be a field in the B+ map struct, and having a borrow_iterator()
require and return a mutable reference to the B+ map, as well as to an iterator. Here, the intent was that a read-only operation would not violate the borrow checker, but a mutation operation would. Notably, however, this prohibits legitimate iterated mutations, and may prohibit read-only traversals.
module econia::b_plus_map {
struct BPlusMap has store {
iterator: Option<Iterator>,
// More fields...
}
struct Iterator {
//
leaf_node_index: u64,
vector_index: u64
}
fun example_calls(
map_ref_mut: &mut BPlusMap
) {
// Borrow iterator at head.
let iterator_mut: &mut Iterator = get_iterator_at_head(map_ref_mut);
// Borrow value of next key (may violate borrow checker).
let next_value_ref = borrow_next(map_ref_mut, iterator_mut);
// Get value, then iterate to next key (definitely violates
// borrow checker).
let value = pop_and_iterate_to_next(map_ref_mut, iterator_mut);
}
}
Alternatively, it may be useful to still store the iterator in the BPlusMap
and not return a mutable reference to it, but simply provide APIs like iterate_from_head()
, iterate_from_tail()
, stop_iteration()
, etc, that update the state of the Iterator
. Then when it is some, callers can iterate_next()
.
Indeed this approach may be more appropriately considered a "cursor", with mutation operations flushing the cursor.
module econia::b_plus_map {
struct BPlusMap has store {
cursor: Option<Cursor>,
// More fields...
}
struct Cursor {
key: u128,
leaf_node_index: u64,
vector_index: u64
}
}
Here, a search operation for a key updates the cursor, so that it can be borrowed against without multiple table lookups: if an operation needs to check before inserting, it can do one search, then lookup straight from the cursor in $O(1)$ time.
Operations that only alter values from a key-value pair do not need to flush the cursor, so some mutation operations may be exempt from flushing logic.
Additionally, to save on function calls like option::is_some()
, the node index 0 could be reserved for NIL
, indicating no cursor when leaf_node_index
is NIL
:
module econia::b_plus_map {
struct BPlusMap has store {
cursor_key: u128,
/// If `NIL` cursor is empty.
cursor_leaf_node_index: u64,
cursor_vector_index: u64
// More fields...
}
}
u256
supportShould Aptos support u256
natives, it may be most appropriate to support u256
keys as the most general case.
This may lead to being able to store fewer keys in an inner node, but if vector operation costs are sufficiently high then optimality conditions for inner node order $m$ may be unaltered.
upsert()
function is proposed which returns an option for an optionally-evicted value at the given insertion key.has()
returns false
, it may be appropriate to leave the cursor at the closest match.For clarity, it may be useful to use the first structure above, with an option-packed Cursor
inside the BPlusMap
struct.
Alternatively, a Cursor
not packed in an option may reduce additional calls to the option
module:
module econia::b_plus_map {
struct BPlusMap has store {
cursor: Cursor,
// More fields...
}
struct Cursor {
key: u128,
leaf_node_index: u64,
vector_index: u64
}
}
Here, helper functions of the form is_cursor_active()
could simply inspect the leaf node index to see if it is non-null.
Additionally, the cursor could serve as a kind of cache, whereby operations first check the cursor to see if it is pointing at a relevant record to look up.
Here, additional fields could be used to say the min/max permissible key inside of a leaf node, the closest match, etc:
module econia::b_plus_map {
struct Cursor {
/// Either found key or closest key from a search.
key: u128
vector_index: u64,
/// `FOUND` or `CLOSEST`.
key_type: bool,
/// `NIL` if cursor inactive.
leaf_node_index: u64,
min_key_in_node: u128,
max_key_in_node: u128,
/// Minimum permissible key in node.
node_key_floor: u128,
/// Maximum permissible key in node.
node_key_ceiling: u128,
}
}
Note that spillover and rebalancing operations may make floor/ceiling fields burdensome to keep current.
Helper functions like is_cursor_active()
could potentially be inlined per aptos-core
#4647, but only if they are inside of the given module:
option calls reference an external module and may thus present a challenge for inlining.
An official number of free bytes, along with more storage details, are expected to be revealed in the coming months per aptos-core
#4625.
AVL queue background
During the time of implementation of Econia's AVL queue, Aptos' gas schedule was subject to regular changes and documentation was practically nonexistent. At the time, provided guidance indicated that gas optimality rested upon minimizing the number of bytes in a per-item operation, such that AVL queue data structures are bitpacked for a minimimized byte footprint.
However, the gas schedule has converged on a byte-wise cost schema that only applies after a critical number of bytes have been operated on, in effect rendering the first 1k bytes "free". This development essentially renders all AVL queue bitpacking operations unnecessary, to the extent that AVL queue gas computation costs are higher than required by the finalized schedule.
Moreover, with the AVL queue offering only 5-bit height tracking and 14-bit access keys, AVL queue height is limited to 18, with a maximum node count of 16383. One implication of this schema is an eviction paradigm whereby the tail of the AVL queue is evicted whenever no more nodes can be inserted.
From an order book persepctive, ideally this paradigm would minimize denial-of-service (DoS) attacks that attempt to increase gas costs by inserting excessive bogus orders: only 18 borrows max are required to search in a full tree. However, this paradigm enables a form of adversarial eviction whereby an attacker can clear out the order book by placing excessive transactions.
M-ary trees
Motivation
With the first 1k bytes of a global storage operation effectively free, and with computational costs dwarfed by storage costs, a proposed remedy is to replace the AVL queue with an m-ary tree-based structure, having at most $m$ children per node. Notably, an AVL tree is an m-ary tree with $m=2$, but in the proposed implementation, $m$ would be set to include as much data as possible in each node without exceeding 1k bytes.
Compared with a binary search tree with $n$ nodes having height on the order of $\log_2 n$, an m-ary tree has height on the order of $\log_m n$. Here, for example, with $m=5$ (up to 5 children per node) and 16383 nodes (the maximum number contained in an AVL queue), height is on the order of $\log_5 16383$ for an m-ary tree versus $\log_2 16383$ for an AVL queue:
Notably, for up to $m=10$ children per node, height is on the order of only 6 for as many as 1 million orders:
Hence an m-ary tree essentially invalidates the need to implement eviction logic, as a malicious actor would have to place upwards of millions of orders to render order book operations prohibitively costly.
B+ trees
In particular, a B+ tree is proposed as the core m-ary tree data structure since it only contains data at leaf nodes, rendering inner nodes as compact as possible. Moreover, leaf node data in a B+ tree is sorted in a linked list, providing for much simpler indexing. With a B+ tree for both bids and asks, it may be appropriate to implement leaf nodes as a doubly linked list, to enable traversal in both ascending and descending order.
Additionally, several design methodologies are considered:
econia::market::Order
is a value. Leaf nodes contain a vector of structs each having a key and the table ID of a corresponding order. Orders are partitioned into separate table entries, and can be accessed in $O(1)$ lookup time via a "value access key" (value table ID). Indexing requires traversing the linked list and then borrowing the corresponding order in the separate data structure.Order
can shift in memory due to node splitting.Implementation considerations
Vector operations
Regardless of the approach, vectors will be required for storing multiple keys and "child pointers" (table indices) for each node. For example, consider an inner node storing $[1, 3, 7, 12, 15, 18]$ in a standard
vector
: Here, inserting $19$ requires a simple (and inexpensive) push back operation, but inserting $2$ requires an $O(n)$ operation, either by re-orering essentially the entire vector after a push back via swaps,$$[1, 3, 7, 12, 15, 18, 2]$$
$$[1, 2, 7, 12, 15, 18, 3]$$
$$[1, 2, 3, 12, 15, 18, 7]$$
$$[1, 2, 3, 7, 15, 18, 12]$$
$$[1, 2, 3, 7, 12, 18, 15]$$
$$[1, 2, 3, 7, 12, 15, 18]$$
or by using a
tmp
variable approach:$$[1, 3, 7, 12, 15, 18], tmp = 2$$
$$[1, 2, 7, 12, 15, 18], tmp = 3$$
$$[1, 2, 3, 12, 15, 18], tmp = 7$$
$$[1, 2, 3, 7, 15, 18], tmp = 12$$
$$[1, 2, 3, 7, 12, 15], tmp = 15$$
$$[1, 2, 3, 7, 12, 15, 18]$$
One solution is to use a singly linked list, but this approach eliminates the kind of binary lookup made possible during search operations.
Notably, with market fills operating on the head or tail of the leaf node doubly linked list, one potential approach is to use a static vector for inner nodes, and a doubly linked list of data items within each leaf node: in a flat/compact approach, for example, leaf node lookup makes use of a binary search within inner node vectors, and linear search at leaf nodes. Here, popping the head of the entire data structure does not require updating every element within the corresponding leaf node's internal doubly linked list.
Notably, actual gas operations may render a static vector the least expensive, due to average number of expected operations, previous/next pointer borrows and upates, etc.
In a static vector implementation, insertion operations that require breaking up a tree could implement the sorted insert during the split operation, to reduce vector lookup.
Operations may be based on Aptos'
SimpleMap
implementation.Iteration
Should internal node indices be required for iteration, it is proposed that an
Iterator
orIndex
struct be used rather than passing table entry IDs, which essentially function as memory pointers.Nested list style
In a nested approach, where price levels are generally static over time and order sequence numbers are constantly churning over, it may be practial to use a doubly linked list inside nodes for the sequence number dimension, and a static vector inside nodes for the price number dimension. With operations inside each price level typically either popping from the front or appending to the tail of the resultant price-time priority queue, operations tend toward $O(1)$ within the sequence number dimension, and $O(\log n)$ within the price level dimension (due to binary search within a price level node).
Sample implementation
See below for an analysis of a flat/partitioned sample implementation.
Structs
Height
Node
(assuming no more than 127 vector elements), the fixed size $f$ of the node is:Item
has size $i$:Node
size (in bytes) $s$, the max number of children $m$ is:$$f + im = s$$ $$im = s - f$$ $$m = \lfloor (s - f) / i \rfloor$$
Gas
Storage gas costs are defined thusly:
Conclusions
econia::b_plus_deque
, stylized asbpdq
). For clarity, and since B+ operations are sorted by default, however, “B+ map” may be more appropriate, to indicate a key-value collection (sorted operations are simply a bonus).@Arinerron @chad-mcdonald @chen-robert @fcremo