Open amari opened 5 years ago
Nice work! I'm quite busy this week and won't have time to perform a full review, however I will try to find some time next for it.
@amari Are you planning on continuing work on this? One of the major concerns (doubly vs singly linked list) is now resolved thanks to #46.
I would like to make one major change regarding the Array
trait. I think we should only have a single Array
trait, with these methods:
trait Array<T>: DerefMut<[T]> {
/// Hint to the array that the hash table is about to insert additional elements.
/// This function can return a new array to tell the hash table to rehash into the new array.
fn reserve(&mut self, len: usize) -> Option<Self>;
/// Hint to shrink the capacity of the array if possible.
/// This function can return a new array to tell the hash table to rehash into the new array.
fn shrink_to_fit(&mut self, len: usize) -> Option<Self>;
}
The key change here is that the Array
is now responsible for tracking the load factor of the table and choosing when to rehash.
The implementations of Array
for [T; N]
will have no-op implementations for both of these methods: a fixed-size array will never re-hash and doesn't care about the load factor.
For dynamic arrays which track the load factor we will provide a DynamicArray<T>
type (we can't just use Vec<T>
or Box<[T]>
because they don't have a load factor). This type will control the load factor and resize as needed.
Yes, I plan on finishing this later this week.
I've been ruminating about the design and came to the conclusion that it is much cleaner to treat HashTable
as a "N+1 degree" data structure, whereas SinglyLinkedList
would be a "N degree" data structure.
This means that HashTableAdapter
does not implement Adapter
because a HashTable
never deals with link operations directly, and instead delegates to the bucket type.
I wanted to get the current design reviewed. Also, test coverage is 93.2%, so check the tests for example usage.
HashOps
implementations must be generic over their value argument, otherwise the compilation fails (the compiler claims that HashOps<A::KeyOps::Key>
is not implemented for IntegerIdentityHashOps
when A::KeyOps::Key = u32
). This is due to the HRTB A::HashOps: for<'b> HashOps<<A::KeyOps as KeyOps<'b, <A::PointerOps as PointerOps>::Value>>::Key>
. The solution is to remove the HRTB, but I am debating whether to add a lifetime parameter to the impl or to remove the lifetime from KeyOps
.Array
or HashTable
. The array would still decide when to grow/shrink, if at all, but a use-case may require optimized time (more buckets, fewer elements per bucket, faster search within bucket) or optimized space.BucketOps
to support custom bucket emptiness tracking.hash_table
module into a single file.BucketCursor
and Cursor
.Originally there was a single Cursor
where move_next
would advance to the next element within the bucket or the first element of the next non-empty bucket.
However the effects of move_next
were dependent on the internal state of the hash table, making the meaning of this code ambiguous:
// ...
let mut cursor = hash_table.cursor_mut();
cursor.move_next();
cursor.insert_after(value);
Into which bucket is the value inserted? There's no way to tell without exposing the current index. Even that wouldn't enable the same degree of control offered by the cursor APIs of the other intrusive collections. move_next_bucket
and move_prev_bucket
would move the cursor to a hybrid "null" state where get
returns None
, but that now means that there is no longer a single "null object", but a multitude. What would is_null
return if the cursor was pointing to the null object of a bucket? (E.g. When inserting a value into an empty bucket.)
KeyOps
and HashOps
Since HashTableAdapter
does not implement Adapter
, we cannot implement KeyAdapter
.
/// Key operations.
pub trait KeyOps<'a, T: ?Sized> {
/// The key type.
type Key;
/// Returns the key of `value`.
fn key(&self, value: &'a T) -> Self::Key;
}
/// Trait for hash operations.
pub trait HashOps<T: ?Sized> {
/// Returns the hash value of `value`.
fn hash(&self, value: &T) -> u64;
}
/// The default implementation of `HashOps`.
#[derive(Default)]
pub struct DefaultHashOps<S: BuildHasher> {
hasher_builder: S,
}
impl<S: BuildHasher> DefaultHashOps<S> {
/// Returns a new hash ops instance.
#[inline]
pub fn from_hasher_builder(hasher_builder: S) -> DefaultHashOps<S> {
DefaultHashOps { hasher_builder }
}
}
impl<S: BuildHasher, T: Hash> HashOps<T> for DefaultHashOps<S> {
#[inline(never)]
fn hash(&self, value: &T) -> u64 {
let mut hasher = self.hasher_builder.build_hasher();
value.hash(&mut hasher);
hasher.finish()
}
}
cursor_mut_from_ptr(ptr).remove()
to have constant time complexity.However, a singly linked list is sufficient for many use cases.
I've added a BucketOps
trait for the following reasons:
Adapter
implementation of the bucket vs HashTableAdapter
SinglyLinkedList
vs LinkedList
)Also, BucketOps::Cursor
is not the same as a collection's Cursor
. It is usually Option<NonNull<A::PointerOps::Value>>
.
pub unsafe trait BucketOps {
/// Collection-specific pointer conversions which allow an object to
/// be inserted in an intrusive collection.
type PointerOps: PointerOps;
/// The hash table bucket type.
type Bucket;
/// The cursor over elements within a bucket.
type Cursor: Clone;
/// Returns `true` if the bucket is empty.
unsafe fn is_empty(&self, bucket: &Self::Bucket) -> bool;
/// Removes all elements from the bucket.
///
/// This will unlink all objects currently in the bucket, which requires iterating through
/// all elements in the bucket. Each element is converted back into an owned pointer and then dropped.
unsafe fn clear(&mut self, bucket: &mut Self::Bucket);
/// Empties the bucket without unlinking or freeing the objects within it.
///
/// Since this does not unlink any objects, any attempts to link these objects into another intrusive collection will fail,
/// but will not cause any memory unsafety.
/// To unlink those objects manually, you must call the `force_unlink` function on them.
unsafe fn fast_clear(&mut self, bucket: &mut Self::Bucket);
/// Returns `true` if the cursor is pointing to the null object.
unsafe fn is_null(&self, bucket: &Self::Bucket, cursor: &Self::Cursor) -> bool;
/// Returns a reference to the object that the cursor is currently pointing to.
///
/// This returns `None` if the cursor is pointing to the null object.
unsafe fn get(
&self,
bucket: &Self::Bucket,
cursor: &Self::Cursor,
) -> Option<&<Self::PointerOps as PointerOps>::Value>;
/// Returns a null cursor for this bucket.
unsafe fn cursor(&self, bucket: &Self::Bucket) -> Self::Cursor;
/// Creates a cursor from a pointer to an element.
///
/// # Safety
/// `ptr` must be a pointer to an object that is a member of this bucket.
unsafe fn cursor_from_ptr(
&self,
bucket: &Self::Bucket,
ptr: *const <Self::PointerOps as PointerOps>::Value,
) -> Self::Cursor;
/// Searches the bucket for the first element where `is_match` returns `true`.
///
/// Returns `None` if no match was found, or the bucket was empty.
unsafe fn find_prev<F>(&self, bucket: &Self::Bucket, is_match: F) -> Option<Self::Cursor>
where
for<'b> F: FnMut(&'b <Self::PointerOps as PointerOps>::Value) -> bool;
/// Returns a cursor that points to the next element of the bucket.
///
/// If `self` points to the null object,
/// then it will return a cursor that points to the first element of the bucket.
///
/// If `self` points to the last element of the bucket,
/// then it will return a cursor that points to the null object.
unsafe fn next(&self, bucket: &Self::Bucket, prev: &Self::Cursor) -> Self::Cursor;
/// Inserts a new element into the bucket after the current one.
///
/// If the cursor currently points to the null object,
/// then the new element is inserted at the front of the bucket.
///
/// # Panics
/// Panics if the new element is already linked to a different intrusive collection.
unsafe fn insert_after(
&mut self,
bucket: &mut Self::Bucket,
prev: &Self::Cursor,
value: <Self::PointerOps as PointerOps>::Pointer,
);
/// Removes the next element from the bucket.
///
/// Returns a pointer to the element that was removed.
/// The cursor is not moved.
///
/// If the cursor currently points to the last element of the bucket, then no element is removed and `None` is returned.
unsafe fn remove_next(
&mut self,
bucket: &mut Self::Bucket,
prev: &Self::Cursor,
) -> Option<<Self::PointerOps as PointerOps>::Pointer>;
/// Removes the current element from the bucket.
///
/// Returns a pointer to the element that was removed.
/// The cursor is advanced to the next element of the bucket.
///
/// If the cursor currently points to the null object, then no element is removed and `None` is returned.
unsafe fn remove(
&mut self,
bucket: &mut Self::Bucket,
current: &mut Self::Cursor,
) -> Option<<Self::PointerOps as PointerOps>::Pointer>;
/// Removes the next element from the bucket, and inserts another object in its place.
///
/// Returns a pointer to the element that was removed.
/// The cursor is not moved.
///
/// If the cursor currently points to the last element of the bucket, then `Err(value)` is returned.
///
/// # Panics
/// Panics if the new element is already linked to a different intrusive collection.
unsafe fn replace_next_with(
&mut self,
bucket: &mut Self::Bucket,
prev: &Self::Cursor,
value: <Self::PointerOps as PointerOps>::Pointer,
) -> Result<<Self::PointerOps as PointerOps>::Pointer, <Self::PointerOps as PointerOps>::Pointer>;
/// Removes the current element from the bucket, and inserts another object in its place.
///
/// Returns a pointer to the element that was removed.
/// The cursor points to the newly added element.
///
/// If the cursor currently points to the null object, then `Err(value)` is returned.
///
/// # Panics
/// Panics if the new element is already linked to a different intrusive collection.
unsafe fn replace_with(
&mut self,
bucket: &mut Self::Bucket,
current: &mut Self::Cursor,
value: <Self::PointerOps as PointerOps>::Pointer,
) -> Result<<Self::PointerOps as PointerOps>::Pointer, <Self::PointerOps as PointerOps>::Pointer>;
}
Intrusive hash table implementation that supports:
Adapter<Link = LinkedListLink>
)bucket usage bitmapEntry
APIThere are a few utility traits:
Array
for implementing the bucket index and the metadata.DynamicArray: Array
for implementing dynamic allocation / resizing.Bits
BitsArray
Dynamic allocation is entirely optional and can be implemented via the
DynamicArray
trait.