JesperAxelsson / rust-intmap

Specialized hashmap for u64 keys
MIT License
30 stars 10 forks source link

Support more integer types #63

Open jakoschiko opened 3 weeks ago

jakoschiko commented 3 weeks ago

Add support for more integer types. Actually it was pretty easy.

I introduce a new sealed trait that is used by IntMap:

pub trait Int: int::SealedInt {}

The most complicated part was implementing the optional serde support. I found an ugly solution, but it's just an implementation detail that we could change later.

I need to change the type of mod_mask from u64 to usize. But I'm think that's okay? We need to be careful here.

I think (but I'm not 100% sure) that this PR does not introduce any breaking changes. IntMap has now a default type parameter and new constructors:

pub struct IntMap<V, I = u64> { ... }

impl<V> for IntMap<V, u64> {
    pub const fn new() -> Self { ... }
    pub fn with_capacity(capacity: usize) -> Self { ... }
}

impl<V, I> for IntMap<V, I> {
    pub const fn new_with() -> Self { ... }
}

impl<V, I: Int> for IntMap<V, I> {
    // We should discuss the names...
    pub fn with_capacity_with(capacity: usize) -> Self { ... }
}

Vec from stdlib does something similar:

pub struct Vec<T, A: Allocator = Global> { ... }

impl<T> for Vec<T, Global> {
    pub const fn new() -> Self { ... }
    pub fn with_capacity(capacity: usize) -> Self { ... }
}

impl<T, A: Allocator> for Vec<T, A> {
    pub const fn new_in(alloc: A) -> Self { ... }
    pub fn with_capacity_in(capacity: usize, alloc: A) -> Self { ... }
}

However, if you prefer a breaking change we could remove the default, swap the type parameters and use the more common names K and V:

pub struct IntMap<K, V> { ... }

impl<K, V> for IntMap<K, V> {
    pub const fn new() -> Self { ... }
}

impl<K: Int, V> for IntMap<K, V> {
    pub fn with_capacity(capacity: usize) -> Self { ... }
}

Closes #61

jakoschiko commented 3 weeks ago

In case you are not familiar with sealed traits: https://predr.ag/blog/definitive-guide-to-sealed-traits-in-rust/

jakoschiko commented 3 weeks ago

Urgh, my ugly solution for optional serde support would require a higher Rust version. I think we need to be more creative here.

Edit: Found a good solution.

JesperAxelsson commented 2 weeks ago

This is pretty neat! One thought I had was is that we only have a handful of int sizes to implement. So another approach would be to have a private IntMap<k, v> and expose IntMapu32<v> and IntMapu64<v> as well as an default IntMap<v> that is just an alias for IntMapu64<v>. The iterators could still expose the generic key value. The danger with using usize is that it's variable sized. While desktop 32bit system is rare they can still be found when doing embedded work. And we would be overallocating when the new 128 bit systems comes ;)

Another way forward might be https://doc.rust-lang.org/reference/items/associated-items.html#associated-types-container-example or https://doc.rust-lang.org/reference/items/generics.html to have a loose function for index calculations.

People have started comming back from the holidays now at work so have less time to might take some time to answer. Thank you for your patience!

jakoschiko commented 2 weeks ago

One thought I had was is that we only have a handful of int sizes to implement.

For demonstration purposes I only added u32, but it would be nice to have all of them (u8, i8, u16, i16, u32, i32, u64, i64, u128, i128, usize, isize).

So another approach would be to have a private IntMap<k, v> and expose IntMapu32<v> and IntMapu64<v> as well as an default IntMap<v> that is just an alias for IntMapu64<v>.

That would drastically increase the visible API. On implementation side we can handle this using macros. But still, the API would be huge. And I think it's also worse for compilation speed.

stdlib did the same with with non-zero integers, e.g. NonZeroU8, NonZeroU16, etc. Later they added a generic NonZero which also uses a sealed trait as type bound.

The danger with using usize is that it's variable sized. While desktop 32bit system is rare they can still be found when doing embedded work. And we would be overallocating when the new 128 bit systems comes ;)

We could use conditional compilation:

#[cfg(target_pointer_width = "32")]
impl_sealed_int! {
    usize,
    4294967291usize // Largest prime for u32
}

#[cfg(target_pointer_width = "64")]
impl_sealed_int! {
    usize,
    11400714819323198549usize // Largest prime for u64
}

Another way forward might be https://doc.rust-lang.org/reference/items/associated-items.html#associated-types-container-example

This might reduce the impact of having different types IntMapU32<V>, IntMapU64<V>. But the user would need to import the trait.

or https://doc.rust-lang.org/reference/items/generics.html to have a loose function for index calculations.

I don't understand this one.

People have started comming back from the holidays now at work so have less time to might take some time to answer. Thank you for your patience!

No problem and thanks for the feedback!

How do you think we should proceed with this topic?

Implementation wise I'm happy with the current PR (but some polishing is still necessary). From an API perspective I would prefer IntMap<K, V> because then one can replace HashMap<u32, String> with IntMap<u32, String>. We could also start with IntMap<V, I = u64> and refactor it to IntMap<K, V> once we need to make a breaking change for another reason.

If you have the strong feeling that another approach is worth to be investigated, then I'll try it. This PR is not really urgent for me, so I would proceed with the performance topic and continue this PR as soon as you made a decision.

JesperAxelsson commented 2 weeks ago

or https://doc.rust-lang.org/reference/items/generics.html to have a loose function for index calculations.

I don't understand this one.

My bad, missed the anchor in the link. Meant const generics: https://doc.rust-lang.org/reference/items/generics.html#const-generics

jakoschiko commented 1 day ago

This feature would be nice for us https://github.com/rust-lang/rfcs/pull/3686 :p