boostorg / unordered

Boost.org unordered module
http://boost.org/libs/unordered
Boost Software License 1.0
62 stars 55 forks source link

[Feature request] boost::unordered_flat_map variant with key and value stored separately #209

Open holenno1 opened 1 year ago

holenno1 commented 1 year ago

Hello!

It would be great if we could have another variant of boost::unordered_flat_map where the keys and values are stored in separate arrays - "split" storage. The main motivation for this is to minimise memory wastage due to padding.

eg: key=uint64_t, value=float would mean 4 bytes of padding per key-value pair.

Obviously this will be a further divergence from std::unordered_map's interface. The main difference being that iterators will return std::pair<const Key&, Value&> instead of std::pair<const Key, Value>&.

However, we believe this will provide a tonne of value to fans of boost::unordered_flat_map.

Thanks!

cmazakas commented 1 year ago

However, we believe this will provide a tonne of value to fans of boost::unordered_flat_map.

Hmm, do you have any concrete use-cases or data for this?

We'd like to consider the idea but before dedicating the resources, we'd like to get a gauge on the effectiveness of including such a map.

ktprime commented 1 year ago

@holenno1 This design needs at least three arrays. Even it can save some memory in use-cases, but it's performance is not as good as you think so in most cases.

holenno1 commented 1 year ago

Hi, sorry was busy all week.

I've collected some data,


A bit of context:

We have an in-house open-addressing hash map implementation that is heavily based on LLVM's llvm::DenseMap. That means

llvm::DenseMap relies on 2 predefined sentinel values to indicate if a slot is "empty" or "erased/tombstone" . Our implementation instead stores an extra bitset, where each slot in the buffer is represented by 2 bits for the aforementioned tags.

In addition, we have another hash map variant that stores the keys and values in separate buffers - "split storage". As mentioned in our feature request, this is a primarily a memory optimisation.


Here's a memory usage experiment:

A real world usage scenario for this setup is object-to-id mapping - we have a large number of heap allocated objects, and we want to assign an ID to each of them.

In fact, 100,000 entries is a little on the low side. We typically deal with 100s of millions or low-billions of entries, which is the typical size of a large netlist or graph problem.

Here are the results:

boost::unordered_flat_map                    : 4,194,304 Bytes
llvm::DenseMap derivative                    : 8,388,608 Bytes
llvm::DenseMap derivative w/ split key-value : 6,291,456 Bytes

*Those numbers were measured by querying /proc/self/stat - Virtual memory size in bytes

The "split storage" variant sees a 25% reduction in memory usage compared to the "pair storage" variant.


Let's look at runtime. For this, I extended the Unordered benchmark suite

Yes @ktprime you are right, the "split storage" variant will have a performance disadvantage for insertion and erasure workloads.

However, "split storage" does have a slight edge when it comes to unsuccessful lookups, as we don't have to visit the value buffer.

image *Those numbers were measured with an isolated Xeon W-2155 (Skylake) server with gcc 12.2, boost 1.81 at -O3

martinus commented 1 year ago

Couldn't you just create a custom key type (say with std::array<std::byte, 8>, or two uint32_t types) so that the alignment doesn't cause padding? then access the data with memcpy. I don't think it would cause much of a slowdown on most systems.

holenno1 commented 1 year ago

~I think you mean use unordered_flat_set (not map) + type punning + const_cast?~ EDIT: Ok I see you meant

struct key {  std::byte raw[8]; };
struct value {  std::byte raw[4]; };

using my_map = boost::unordered_flat_map<key, value, key_hash, key_eq>;

We can certainly do that, but I think the spirit of the feature request is so we don't have to actually do that :).