Open Chriscbr opened 7 months ago
Are you talking about the number of elements in the set or the size of the set's element?
usize
if the size of the set's elements is also smaller than usize
In the this case, using key-sized indices in sparse
seems like a good idea. All keys are unique, so, if I'm not mistaken, it's impossible to store more than $2^{32}-1$ key-value pairs in a map of u32
s (or keys of any other size that's smaller than usize
). Thus, storing u32
indices in sparse
is guaranteed to be sufficient to store entry for each key even. If this is the case you meant, I think locking in to a smaller keyspace is also not too bad, since that's already the case due to the key-uniqueness guarantee.
usize
if the number of elements in the set is expected to be smallUsing indices in sparse
that are even smaller than the key size would also be an interesting idea. However, it might create a pretty severe lock-in later on when inserting more entries. In the worst case, at some point the entire sparse
array would need to be copied to use indices of a bigger size. For example, if sparse
starts out as a Vec<u16>
, it's conceivable that it needs to be resized to a Vec<u32>
later on. I imagine doing so without changing the type of set whose sparse
is resized would be very difficult. I'm not sure if this increase in complexity is worth it, given that we already resize sparse
automatically depending on the number of entries.
Thanks for clarifying. Yeah -- the first case (using smaller indices than usize if the size of the set's elements is also smaller than usize) was the one I had in mind, basically relying on the key uniqueness as you described quite well.
An option that keeps things simpler (without generics etc.) at the sacrifice of some flexibility could be to only have u32 and u64 implementations -- they could be named SparseSet
and BigSparseSet
for example, with corresponding map types. SparseSet
's capacity would be restricted to u32::MAX, and BigSparseSet
's capacity would be restricted to u64::MAX. Macros can be used to reduce code duplication etc.
If you know the elements in your set will be "small" -- say, always between 0 and 2^32 -- the current implementation of SparseMap/SparseSet is space inefficient since the
sparse
array stores aVec<usize>
when aVec<u32>
would be sufficient.One might even argue 2^32 is actually more space than sufficient for most use cases (like storing sets of IPv4 addresses), so this should be the default. The main downside is once you commit to a smaller keyspace, like u32, that's a hard bound on the map's max capacity.
It's also possible you might want to use the even smaller keyspaces of 2^16 or 2^8 (to store u16 or u8 values).
What's the best way to support this?