pXXXX - Fallible Allocators for Algorithms

ThePhD commented 5 years ago

std::scratch_space is a type that is meant to be passed to algorithms which currently internally and opaquely allocate to gain their complexity guarantees. Three such algorithms in the standard are known:

std::stable_sort
std::stable_partition
std::inplace_merge

For -ffreestanding and similar, we should provide overloads which take this scratch space bucket and work solely within it, to avoid paying the cost of opaque dynamic allocations inside of the algorithms.

ThePhD commented 5 years ago

Despite people telling me it exists, I cannot find the overloads of Alexander Stepanov's STL algorithms with anything like a std::scratch_space: http://stepanovpapers.com/STL/DOC.PDF

Morwenn commented 5 years ago

Don't forget that if you provide pre-allocated buffers, people will ask to support allocators too. It is tightly linked to std::temporary_buffer where the decisions was: either augment it to work with allocators or get rid of it. And they got rid of it.

ThePhD commented 5 years ago

Some insight: MSVC's STL already has a basically-allocator plus optimistic-stack-allocation internals for this. We'd likely just be standardizing an existing library practice: need to look at __stable_partition in more libraries.

Morwenn commented 5 years ago

IIRC both libc++ and libstdc++ rely on global operator new to allocate the memory (with varying support for sized deallocation and over-alignment), so it's most likely done on the heap.

Also related: Stepanov mentioned long ago that std::get_temporary_buffer was "currently" inefficient (the good ol' split the buffer size in half until you can allocate something) but that having dedicated low-level memory allocation functions could make it a very interesting solution. As far as I know, such a function was never proposed to WG14 anyway, and thus std::get_temporary_buffer remained forever unoptimized (there is a note about it in either Elements of Programming or From Mathematics to Generic Programming, I would need to find it again, it doesn't mention Howard's proposal).

There could be ways to optimize std::get_temporary_buffer if more low-level memory allocation functions existed:

First of all, maybe some memory allocation functions know at a given point how much memory they have left to allocate. I don't know whether it is the case or not, but it would allow the temporary buffer functions to make O(1) allocations instead of O(log n) with the current unoptimized method.
Failing that, it could still take advantage of realloc-like functions for regrowth, but let me take that to another comment to make more points.

Morwenn commented 5 years ago

In the previous comment I started to mention regrowth for std::scratch_space: to me it is a major use case of such buffers. In cpp-sort I use it in algorithms such as merge_sort where I don't know how much memory I will need to allocate depending on the pattern, so I tend to reallocate as needed, which was more sensible in my benchmarks than always allocating memory equivalent to half of the collection.

As a practical example to discuss further, here is my temporary_buffer implementation in cpp-sort (unfortunately without allocator support). And here is its try_grow function:

auto try_grow(std::ptrdiff_t count) noexcept
    -> bool
{
    auto tmp = get_temporary_buffer<T>(count, buffer_size);
    if (not tmp.first) {
        // If the allocated buffer isn't bigger, keep the old one
        return_temporary_buffer<T>(tmp.first, tmp.second);
        return false;
    }
    // If the allocated buffer is big enough, replace the previous one
    return_temporary_buffer(buffer, buffer_size);
    buffer = tmp.first;
    buffer_size = tmp.second;
    return true;
}

Most of my algorithms - just like those you are trying to improve - don't actually need a buffer at all to work, but the bigger the buffer, the simpler the algorithm and the faster it should run. This function answers to those specific needs by performing the following operation: it tries to grow the buffer up to a given size, and doesn't allocate anything if it can't. In order to fo this, get_temporary_buffer is actually a reimplementation of the standard library one, except that it takes an additional min_count parameter: it gives up if it can't allocate more than min_count objects.

If better low-level memory allocation functions were provided, then we could most likely improve upon the status quo:

It could take advantage of the proposed size feedback from operator new (see P0901R4), which makes sense when you know that you might need to make your buffer grow because it might make your buffer big enough to not regrow at all in some cases.
Second, realloc-like functions could try to grow the buffer in-place. I know that a reallocation mechanism is currently proposed for standardization in P0894 (see later comment). Such a function was also proposed to WG14 for standardization by Howard Hinnant in N1085 - Proposal to augment the interface of malloc/free/realloc/calloc under the name negotiate_alloc but it never made it into the C standard, and thus not into the C++ standard either. The proposal was rejected by lack of interest (see next comment). These functions are actually very similar to my try_grow function, except that they are meant for in-place growth.
Third, when you want to reallocate but don't care about it happening in-place, I guess that if you've got a memory pattern such as [ free memory | scratch_space current buffer | free memory ], then a smart memory allocator might be able to claim and collapse those in a single region of storage, while with my current implementation I don't deallocate the current buffer before trying to allocate a new one, because in a threaded program I might end up with less memory if some other thread claims memory right after I've deallocated it. It's maybe an issue that a smart memory allocation function might be able to solve?

Also it's worth noting that it's really all about allocating and deallocating memory. There is an implicit contract that when the destructor or try_grow are called, there isn't an object left to destroy in the storage, so we don't care about moving copying or moving elements when a potential reallocation occurs.

It's a bit unstructured, but you now have my thoughts and ruminations about the use cases, potential optimization and assumptions about a potential std::scratch_space design. What's interesting is that the whole gamut of possible memory allocation optimizations can be used to improve these algorithms. My use case basically corresponds to reimplementing inplace_merge and stable_sort, so it should closely match what you would want from a feature designed to implement those algorithms.

Morwenn commented 5 years ago

Concerning N1085, here is an answer by Howard himself:

I did not attend WG14 personally to present this paper. The feedback I got was that there was insufficient interest to move forward with it. I.e. no one present was excited enough about it to do the hard work, and even I, the author, wasn’t willing to shell out the vacation time and travel expense to make it happen.

I.e. good ideas are often not enough by themselves. Someone has to be willing to champion it in person. It takes time and money (the money for travel, nothing nefarious).

On the positive side it means that such memory allocation proposals weren't shot down because of implementability concerns or design issues, but purely out of lack of interest.

Morwenn commented 5 years ago

And I found the realloc paper for C++: P0894. Here is the motivation part:

As we can see, there is no counterpart for realloc(). And there is a solid reason for that: realloc() copies the memory block if it cannot be just expanded or shortened. It is unacceptable behaviour because objects in C++ are not trivially copyable in general. Therefore realloc() cannot be used in generic code. However the ability to resize already allocated memory block can be very useful. We could potentially get performance and safety gain: no need to move the existing data (performance) and consequently no risk to cause an exception by throwing move-operation (safety). Yes, it cannot be guaranteed that every resize request will be satisfied (it won't usually in fact). And what do we do in such case? All we do today! Just fall back to the current technics.

Interestingly enough the way I use the assumption that I mentioned a few comments ago - about not having to copy/move objects and just reallocating storage - might apparently be a good use case for realloc in C++ since I assume that I only manipulate memory which is just raw storage without objects allocated in it.

I think that I'm done commenting and throwing ideas and remarks for now. Sorry for the spam ^^'

ThePhD commented 5 years ago

Nothing so educational could ever be labeled with a word like "spam".

Morwenn commented 5 years ago

I updated my cpp-sort library to take advantage of what I mentioned in my big comment, and restructured said comment a bit to make it clearer and less redundant :)

ThePhD commented 3 years ago

This issue has been subsumed by the need for a fallible allocator, which is separate from a "more complete" allocator (though should receive the same improvements).

ThePhD commented 3 years ago

Depends on #34/p2256.

h-vetinari commented 1 year ago

Why not use a verb? The ranges name spaces is full of them, so why not also std::allocate? I mean, yes, people would probably get mad (because consistency, bikesheds, etc.), but perhaps it's still a less-bad trade-off overall (and might be less painful to keep looking at while you work on it).

Bonus: std::allocate pairs nicely with std::try_allocate; though C++ people would probably blow a gasket because it sounds like it may have been inspired/influenced by rust (the horror!).

ThePhD commented 1 year ago

I could try that. I haven't come across any name that really helps.

Right now it's not too important, though, because one of the things I'm realizing is that this is going to have to coexist with the initial allocator's infrastructure anyways. Might as well just make all the function names be different and let the existence of said functions be the detection mechanism on any given allocator.

ThePhD / future_cxx

pXXXX - Fallible Allocators for Algorithms #14