Using size feedback in the Allocator interface for flat_hash_map

rigtorp commented 3 years ago

Consider implementing P0401R3 Providing size feedback in the Allocator interface for flat_hash_map and friends. I've been using flat_hash_map with huge pages and P0401R3 can prevent wasting memory. I added support to my own library SPSCQueue and it can be done unobtrusively in C++17: https://github.com/rigtorp/SPSCQueue/blob/master/include/rigtorp/SPSCQueue.h#L37

sbenzaquen commented 3 years ago

It is unlikely that flat_hash_map can take advantage of this API effectively. It does not have much flexibility in allocation sizes, like you would for a vector/queue. It has very specific sizes it can use derived from power-of-two requirements on the number of slots. If the allocator gives us a little bit more than what we ask for it would be ignored. It would have to give us enough for double the capacity, and that is unlikely.

Do you have evidence that P0401R3 can help here?

rigtorp commented 3 years ago

I agree using default heap allocator it's unlikely to help. It becomes useful when you are using a custom allocator that returns huge pages. Quite nice when you want to place a huge hash table on a 1GiB page. Of course you can reach inside the implementation and calculate the appropriate argument to reserve(). Keep it in mind and let me know if you end up implementing it. Any industry use helps it get through the standards committee.

sbenzaquen commented 3 years ago

I see. It is a special allocator that will give very large allocations to begin with. I agree that it makes a lot of sense in that case. We can jump directly to a large capacity to use the huge page that the allocator gave us.

ckennelly commented 3 years ago

I'm one of the authors of P0401. When drafting this (and P0901), I had primarily anticipated a use case for types like std::vector, std::string, etc.--basically, array-like types.

So that I can better understand the use case: If you were to call reserve() with a size that'd be 2.1MBs (ordinarily), just over a single x86_64 huge page, would you anticipate your allocator to return 1GB of memory (a full huge-huge page) so that it can reside on a single TLB entry?

rigtorp commented 3 years ago

@ckennelly I think it's a great proposal. I believe there are additional more niche use cases. I spoke to Jonathan Wakely about how I use it for concurrent ring buffers (https://github.com/rigtorp/SPSCQueue/blob/master/include/rigtorp/SPSCQueue.h#L37).

So that I can better understand the use case: If you were to call reserve() with a size that'd be 2.1MBs (ordinarily), just over a single x86_64 huge page, would you anticipate your allocator to return 1GB of memory (a full huge-huge page) so that it can reside on a single TLB entry?

Yes exactly. In my specific real time type workload I would also have further allocations fail and handle it as a failed insert() instead (not sure yet if this would actually work as is in flat_hash_map). The allocator would look something like this but for 1GiB pages and with size feedback (https://rigtorp.se/hugepages/):

#include <sys/mman.h> // mmap, munmap

template <typename T> struct huge_page_allocator {
  constexpr static std::size_t huge_page_size = 1 << 21; // 2 MiB
  using value_type = T;

  huge_page_allocator() = default;
  template <class U>
  constexpr huge_page_allocator(const huge_page_allocator<U> &) noexcept {}

  size_t round_to_huge_page_size(size_t n) {
    return (((n - 1) / huge_page_size) + 1) * huge_page_size;
  }

  T *allocate(std::size_t n) {
    if (n > std::numeric_limits<std::size_t>::max() / sizeof(T)) {
      throw std::bad_alloc();
    }
    auto p = static_cast<T *>(mmap(
        nullptr, round_to_huge_page_size(n * sizeof(T)), PROT_READ | PROT_WRITE,
        MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0));
    if (p == MAP_FAILED) {
      throw std::bad_alloc();
    }
    return p;
  }

  void deallocate(T *p, std::size_t n) {
    munmap(p, round_to_huge_page_size(n * sizeof(T)));
  }
};

ckennelly commented 2 years ago

I think the good news here is that P0401 is now part of C++23.

That said, I don't think this is currently practical in the implementation of flat_hash_map, since it typically works with a power-of-2 number of elements. Changing that is a bit tricky, since integer division (and various alternatives) tend to slow down an otherwise very fast lookup path.

rigtorp commented 2 years ago

That's fantastic news!

Well, maybe there is a case during a 2x growth the allocator actually gives you a 4x growth. Or if the allocator always gives you 1GiB pages, it still make sense to use all the returned memory?

Anyway this is a nice improvement to the allocator api.

On Sat, Oct 30, 2021 at 3:47 AM Chris Kennelly @.***> wrote:

I think the good news here is that P0401 is now part of C++23.

That said, I don't think this is currently practical in the implementation of flat_hash_map, since it typically works with a power-of-2 number of elements. Changing that is a bit tricky, since integer division (and various alternatives) tend to slow down an otherwise very fast lookup path.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/abseil/abseil-cpp/issues/808#issuecomment-955124656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABLO2YEP5ZPWYQBSFX5ZZTUJNFDRANCNFSM4SG6IGFQ .

abseil / abseil-cpp

Using size feedback in the Allocator interface for flat_hash_map #808