Closed brurucy closed 4 months ago
While the operations are called the same and do a similar operation in theory, I specifically only implemented and tested rank/select data structures on bit vectors.
Those can be heavily optimized by exploiting bitwise operations on the x86 architecture and exploiting the fact that only ones and zeros are part of the data structure. So comparing this library would be pretty pointless, as indexset
is a non-duplicate set (I think?), and supports all types of ordered elements.
The rank of an element in indexset is its position in the sorted set, while the rank of a bit in a bit vector is the number of equal bits before that element.
There might be a way to implement a bit vector using your fenwick tree, but I never worked with those, so I am not sure.
Hmm.
In this case, would it not be possible to generalise the indexset container/Node to support that?
I had this idea for a while. To make indexset
generic on the Node
.
Maybe? I don't exactly know how deep the exclusivity of equal elements is rooted in the datastructure and if it could support unsorted sets.
Keep in mind though, that the rank implementation of BitVectors takes an index as parameter and returns the amount of 1 before that index for rank1 and the amount of zeros before that index for rank0, regardless of what element is at that index. So there are possibly multiple indices with the same rankX.
If we look at Node
:
struct Node<T>
where
T: PartialOrd + Clone,
{
pub inner: Vec<T>,
pub max: Option<T>,
pub iterations: usize,
}
impl<T: PartialOrd + Clone> PartialOrd for Node<T> {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
self.max.partial_cmp(&other.max)
}
}
impl<T: PartialOrd + Clone> Default for Node<T> {
fn default() -> Self {
Self {
inner: Vec::with_capacity(DEFAULT_INNER_SIZE),
max: None,
iterations: 10,
}
}
}
fn search<T: PartialOrd>(haystack: &[T], needle: &T, iterations: usize) -> Result<usize, usize> {
let mut left = 0;
let mut right = haystack.len();
for _ in 0..iterations {
if left >= right {
break;
}
let mid = left + (right - left) / 2;
let mid_value = unsafe { haystack.get_unchecked(mid) };
if mid_value < needle {
left = mid + 1;
} else if mid_value > needle {
right = mid;
} else {
return Ok(mid);
}
}
Err(left)
}
impl<T: Ord + Clone> Node<T> {
pub fn new(capacity: usize) -> Self {
Self {
inner: Vec::with_capacity(capacity),
iterations: capacity.ilog2() as usize,
..Default::default()
}
}
pub fn get(&self, index: usize) -> Option<&T> {
self.inner.get(index)
}
pub fn split_off(&mut self, cutoff: usize) -> Self {
let latter_inner = self.inner.split_off(cutoff);
self.max = self.inner.last().cloned();
let latter_inner_max = latter_inner.last().cloned();
Self {
inner: latter_inner,
max: latter_inner_max,
iterations: self.iterations,
}
}
pub fn halve(&mut self) -> Self {
self.split_off(DEFAULT_CUTOFF)
}
pub fn len(&self) -> usize {
self.inner.len()
}
pub fn insert(&mut self, value: T) -> bool {
match search(&self.inner, &value, self.iterations) {
Ok(_) => return false,
Err(idx) => {
let some_value = Some(&value);
if some_value > self.max.as_ref() {
self.max = some_value.cloned()
}
self.inner.insert(idx, value);
}
}
true
}
pub fn delete(&mut self, index: usize) -> T {
self.inner.remove(index)
}
}
It could be a trait requiring delete
, insert
, get
and halve
.
But wouldn't functions like search
get extremely slow if you gave up sorted order within nodes?
Hey there!
Thanks a lot for the PRs that made range iteration work better.
How does
indexset
's rank and select fare against what you have here?