Explore Skip List Algorithm

Firionus commented 1 year ago

See https://stackoverflow.com/a/10696252

Is there literature on that? Could it be faster? Does it allow for arbitrary percentiles like SortFilters?

Firionus commented 1 year ago

Preliminary results:

A quick implementation based on https://github.com/kernelmethod/SkipLists.jl was around 6x slower than baseline (FastRunningMedian 0.2). But it already shows to be easily extendable to arbitrary percentiles and scales much better than SortFilters. This is worth investigating.
A custom-fit and mostly un-optimized SkipList was around 3.5x slower than baseline. I hypothesize with some profiling evidence that slow performance is due to many cache-inefficient pointer chases while looking for the insertion point of a new value. It seems generally hard to write a cache-optimized mutable SkipList. It quickly looks like a B-Tree, so let's go for that instead.

A fast algorithm generally will use a linear/circular buffer for the input values with pointers/indices into the sorted structure and a cache-optimized sorted structure with support for fast O(log n) insertion. Further, the data structure should support quickly finding the next or previous element of the current median, so that the new median can be quickly determined based on insertion position of the new value and removal position of the old value.

Based on this, I suggest to use a cache-optimized B-Tree as data structure. A Julian implementation might look like this:

In a B-Tree of order d, each node contains between d and 2d elements (inclusive) (see https://doi.org/10.1145/342009.335449 for a similar approach for a B+-Tree. For my case, I think B-Tree is better and modifications need to be made because Julia doesn't allow inlining mutable data in memory)
Each level of the tree is an Array, where the 1-based level L is also the dimensionality of the Array
In L=1, the index corresponds to the nodes. In the following levels, the first L-1 indices indicate the node and the last index indicates the position in the node.
The first level has 2d elements. The consecutive levels with their additional dimensions each have a size of 2d+1 to accomodate the fact that child nodes fit between as well as before and after each parent element.
Each node has a maximum length, behind which the data in the Array is left uninitialized or filled with invalid data. The maximum lengths for each level have to be kept in separate Arrays (because of different data type) of dimensionality L-1, one for each level. The size would be 2d+1 in every dimension. In L=1, the dimensionality of the node length array is 0, hence there is only 1 value.
Insertion/Deletion follows the normal rules of a B-Tree. This means that most of the time insertions/removals should be cheap, but changes in data distribution might often cause expensive rebalancing.

It is hoped that by moving to Arrays, linear searches inside nodes can be made fast due to high memory locality. Keeping the data structure in such a memory local state requires additional overhead (splitting/merging operations), but it is hoped there is enough flexibility in node length to avoid such rebalances most of the time with typical data distributions. A quick implementation will have to be made, followed by benchmarks and profiles against baseline to determine the validity of these assumptions.

Firionus commented 9 months ago

I created a prototype BTree implementation. After removing type instabilities but not optimizing the algorithm it performs about 4x slower than baseline (FastRunningMedian) while scaling worse. The main slowdown comes from:

shifting in nodes during delete operations,
keeping the linear buffer updated with the current position of nodes as they are shifted in the tree and
many small operations during balancing.

Searching in the tree is less of a problem, whereas it was the main problem with SkipLists. The implementation is currently private on my PC due to its prototype quality.

It is known that BTrees are relatively good at reading and worse at writing than some other data structures. For possible improvements with modified data structures, I plan to read this paper: http://supertech.csail.mit.edu/papers/BenderFaJa15.pdf I will pick this optimization up again at a later point in time.

EDIT: I have a new idea for optimizing: Let's make insertion and deletion lazy by letting leaf nodes be unsorted. When we delete, just mark it with a specific linear buffer index. This avoids shifting and updates in the linear buffer, massively reducing deletion cost at the cost of tree balance. We need to figure out a way to keep the tree balanced. When inserting, just insert at first gap or end. When looking for a next/previous/min/max element in a node, do a linear scan.

Firionus / FastRunningMedian.jl

Explore Skip List Algorithm #19