Open ccollie opened 1 week ago
I might be wrong but just summing the number o children from each node would give the wrong result, imagine a case where we have N4 (Leaf, N4 (Leaf, Leaf))
, summing the number o children of each node would result in 4, but it's clear that the number of children is 3.
Adding a new field to header would increase memory consumption, since the header already uses 22 bytes with no padding between fields. So adding a new u32
would mean a 18.18% increase in memory consumption of the header, and also would make insertions/deletions harder, since this value is recursively accumulated up to the root.
Also given how the new prefix iterator works, we basically don't iterate over the tree it self, we just find the min and max leafs with the prefix and do a range iterator over it, which is basically a linked list iteration.
Also iterating over the tree it self would probably be slower than over the leafs with the next
pointer, since we need to keep a vec as a stack.
Whoops, I meant to post a comment here a day ago and lost it.
I agree with @Gab-Menezes that the most likely implementation of this would be to add a new field to the header of internal nodes to track the number of leaf node descendants. This would imply some changes to the insert and delete operations, so that the counter would be incremented or decremented on a successful operation.
On the pros side, then we would have the ability to cheaply implement ExactSizeIterator
for a larger number of iterators, like for fuzzy_*
, prefix_*
, or range_*
. I think this would be beneficial for cases where:
Vec::with_capacity
)The other performance related reason is that it allows me to decide on the most efficient way to iterate the collection. In my case, I run matching functions (including regexes) against items matching a prefix.
Above a certain count, it's worth it to switch to optimized predicates, but those require a compilation step which can then be amortized over the length of longer iterators.
IDK how I feel about this, this will definitely make insertions/deletions way slower, since right now we just find the writing point and apply the changes. With this we somehow need to find a way to keep track of all the parents while inserting/deleting (basically creating a stack, or doing recursion and both have performance implications) to increment this number if the insertion was successful.
If this makes general case insertions/deletions measurably slower, I'd be Ok without it.
I'd like to be able to get a count of entries starting with a given prefix. Currently its possible iterating through a
prefix
iterator and counting elements, however we can optimize this by visiting nodes and summing children.