Strange exception from unordered_set with memory_stack, on MSVC

jwdevel commented 2 years ago

With the Microsoft compiler (I'm using VC19, but I imagine it's the same for others), and with fairly simple usage of std::unordered_set with a memory_stack allocator, I get an exception coming from inside the STL.

I believe it is related to the extra implementation_offset reserved by foonathan.

I'm not sure if this represents a bug in the MSVC STL, a bug in foonathan::memory, or if this is some expected limitation.

Below is a simple failing test. The insert() line will throw std::length_error with message invalid hash bucket count.

TEST_CASE("std unordered_set problem on MSVC") {
    using memory_stack_t = foonathan::memory::memory_stack<>;
    memory_stack_t myAlloc(1024);
    using MySet = foonathan::memory::unordered_set<int, memory_stack_t>;
    MySet mySet(myAlloc);
    for (int i = 0; i < 1000; ++i) {
        mySet.insert(i);
    }
}

This is the line in the STL that generates the error: https://github.com/microsoft/STL/blob/2f8342a3a57fb157d881c6a2d42a917d20e413f8/stl/inc/xhash#L1717

Feel free to step through it yourself, but my brief summary:

inserting enough items causes a re-hash
the STL code seems to assume that a next-larger-power-of-2 size number of buckets will be available
the allocator's max_size comes back as slightly smaller than a power of 2, because of implementation_offset
this causes the STL code to round it down to the next lower power of 2 (see _Floor_of_log_2(...))
this causes it to raise the exception, since the number of buckets it wants is not in agreement with the buckets available.

In particular, there is this comment:

// Don't violate power of 2, fits in half the bucket vector invariant:
// (we assume because vector must use single allocations; as a result, its max_size fits in a size_t)

I am not sure if this invariant is somehow guaranteed by the C++ standard, or is just an assumption made in the MSVC STL. It all works fine on glibc, for what that's worth.

I wonder if the implementation_offset amount should be something in addition to the basic grow-by-doubling calculation, rather than eating into it, slightly?

foonathan commented 2 years ago

I wonder if the implementation_offset amount should be something in addition to the basic grow-by-doubling calculation, rather than eating into it, slightly?

The issue with that approach is that it can waste memory. For example, before it allocates 4096 bytes then 8192, then 16384 - all of which are multiple of the page size and can thus be served by allocating whole pages using a page allocator. If we then add the offset on top we need to start an entire new page to store the extra pointer.

I think the correct fix here is to allocate the extra capacity in your stack up front.

jwdevel commented 2 years ago

I think the correct fix here is to allocate the extra capacity in your stack up front.

What does that look like, exactly? You mean in client code, have memory_stack_t myAlloc(1024 + X); where X accounts for implementation_offset ?

foonathan commented 2 years ago

That should work, yes. You can use memory_stack::min_block_size(1024) to do the calculation for you.

jwdevel commented 2 years ago

Thanks; that does seem to fix the issue for the case I posted.

However, when trying to apply the fix locally, I noticed another related case, using a node pool allocator:

using MyPool = memory_pool<node_pool, growing_block_allocator<heap_allocator, 2, 1>>;
const auto kNodeSize = unordered_map_node_size<std::pair<int, char>>::value;
MyPool myAlloc(
    kNodeSize,
    1024); // number of nodes

using MyMap = unordered_map<int, char, MyPool>;

MyMap map(myPool);

for (int i = 0; i < 1000; ++i) {
    map[i] = (char)i;
}

Here, it gets the same invalid hash bucket count error. But it's not clear to me how to fix this case. Here, the caller is not specifying a number of bytes for the memory blocks.

Does this represent a bug inside memory_pool, then?

foonathan commented 2 years ago

Here, the caller is not specifying a number of bytes for the memory blocks.

They do, the second parameter is the number of bytes, not number of nodes. You can likewise use min_block_size() to get a byte size that includes the implementation offset.

jwdevel commented 2 years ago

Ah, sorry for the confusion. I am still seeing a crash though (see end).

I was thrown off by the fact that MyPool::min_block_size(nodeSize, numberOfNodes) takes a number of nodes. And presumably that is the function that would be called to generate the value (which is number of bytes, as you say).

So I guess you mean a user should do something like this:

MyPool myAlloc(
    kNodeSize,
    MyPool::min_block_size(kNodeSize, 1000));  // 1000 is the number of nodes, who knows how many bytes you'll get

That does indeed work, for me.

However, if a user wanted to grow by (approximately) some number of bytes rather than nodes, it seems they would need to do their own math, perhaps like:

size_t approxNumNodesInBlock = 1024 / kNodeSize;
MyPool myAlloc(
    kNodeSize,
    MyPool::min_block_size(kNodeSize, approxNumNodesInBlock));

That case still crashes, for me. The math inside min_block_size happens to work out to return exactly 1024, which is the same as the (wrongly-commented) crash mentioned previously.

So, working backwards from that, here is another crash, which doesn't try to do any weird math in client code:

MyPool myAlloc(
    kNodeSize,
    MyPool::min_block_size(kNodeSize, 42));

The number 42 is chosen there to make the math work out to an even 1024 again, so it is the same crash, just achieved in a different way. Previously, with the number 1000, it was working by accident, it seems.

Or is there some other min_block_size you were referring to, which I should use for a node pool?

foonathan commented 2 years ago

I don't think there is anything I can do on the library side about it. MSVC wants to allocate a big power of two, which the allocator isn't designed to handle -- it also doesn't need to, allocating big memory pages is perfectly fine for the default allocator.

The fundamental issue is that unordered_map is a hybrid container: it allocates both arrays and individual nodes, and the pool should only be used for the latter.

jwdevel commented 2 years ago

Ok, fair enough; thank you for sticking with me through the explanation (:

So it seems like a node pool allocator is just a bad fit for the use-case.

I was just thrown off by the fact that MSVC fails where others did not. But I guess in theory gcc and others would have the same problem, given the right circumstances...

MiguelCompany commented 2 years ago

@jwdevel I've just seen this. On Fast DDS we are using a foonathan::memory::binary_segregator for some of our unordered_map instances. See here for an example code.

foonathan / memory

Strange exception from unordered_set with memory_stack, on MSVC #144