dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

Unnecessary Slow Aligned Memory Reallocation (Multiple of System Page Size) #110225

Open PavelCibulka opened 5 days ago

PavelCibulka commented 5 days ago

I've been experimenting with resizing allocated aligned memory. I believe that increasing or decreasing memory by multiples of the system page size should be almost instantaneous.

The system seems capable of this when tested with NativeMemory.Realloc, which completes in around 1ms. However, NativeMemory.Realloc doesn't guarantee alignment preservation.

  public unsafe void Alloc() {
        long size = 4L * 1024 * 1024 * 1024;
        void* mem = NativeMemory.Alloc((nuint)size);
        void* mem2 = NativeMemory.Realloc(mem, (nuint)(size + Environment.SystemPageSize));
        void* mem3 = NativeMemory.Realloc(mem2, (nuint)(size));
    }

When I perform the same test with NativeMemory.AlignedRealloc, it takes several seconds to complete. It should be as fast as NativeMemory.Realloc when the requested alignment remains unchanged and the memory is resized by multiples of the system page size.

 public unsafe void AlignedAlloc() {
        long size = 4L * 1024 * 1024 * 1024;
        void* mem = NativeMemory.AlignedAlloc((nuint)size, 64);
        void* mem2 = NativeMemory.AlignedRealloc(mem, (nuint)(size + Environment.SystemPageSize), 64);
        void* mem3 = NativeMemory.AlignedRealloc(mem2, (nuint)(size), 64);
    }

I'm unsure whether this issue should be reported to the .NET team or the operating system developers. I'm using Ubuntu 24.04 with kernel 6.10.14.

dotnet-policy-service[bot] commented 5 days ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

dotnet-policy-service[bot] commented 5 days ago

Tagging subscribers to this area: @dotnet/interop-contrib See info in area-owners.md if you want to be subscribed.

tannergooding commented 5 days ago

For the most part NativeMemory APIs are just thin wrappers over the underlying C runtime

For example:

Its a little bit more flexible with the aligned variants, as C didn't standardize them until more recently:

NativeMemory.AlignedRealloc then ends up deferring to _aligned_realloc on Win32. But there is no equivalent C API and while if the memory was allocated with the C API you can use realloc, it doesn't allow changing the alignment (it should preserve it however, as its meant to be aware of that scenario). It's also worth noting that while realloc can theoretically grow an existing allocation and avoid the copy, that's fairly uncommon in practice and is dependent on many other factors. In many cases it functionally is malloc+copy+free and that is correspondingly what the underlying fallback implementation does on systems where aligned_alloc is a thing.

Notably "changing" the alignment is technically undefined behavior for the C API and it is technically possible for us to ignore the input and use realloc for the underlying implementation in that scenario, which might improve the performance on non Windows systems. But such a change likely needs deeper discussion.

PavelCibulka commented 3 days ago

Thank you for the very detailed information.

If I understand correctly:

If so, what is the maximum alignment that NativeMemory.Realloc would maintain? Would it be 64 bytes, system page size, or another value?

Can we include this information in the NativeMemory.Realloc documentation?

Is the only purpose of NativeMemory.AlignedRealloc for situations when you want to change alignment?

tannergooding commented 3 days ago

Change the size using NativeMemory.Realloc (alignment will be preserved even if allocated with NativeMemory.Alloc).

From a general public contract point for .NET, the NativeMemory APIs Alloc/Realloc/Free are guaranteed to work together and AlignedAlloc/AlignedRealloc/AlignedFree are guaranteed to work together. It is not guaranteed that other mixes work, such as Realloc/Free with AlignedAlloc will work. Mixing APIs can therefore lead to undefined behavior.

The reason this nuance exists is because in some scenarios, like if we used certain POSIX APIs or on Windows where we need to defer to _aligned_malloc, they are strictly incompatible with the C runtime APIs realloc/free and can only be used with the corresponding native APIs (such as _aligned_realloc/_aligned_free).

While we currently use the underlying C runtime API on systems that provide it (currently all officially supported Linux systems), we don't surface that detail publicly and so there's no way to query it. If such a detail was surfaced (or you were willing to rely on a point in time implementation detail) then mixing NativeMemory.Realloc + NativeMemory.AlignedAlloc is safe on those specific scenarios due to the underlying guarantees of the C runtime itself, which is that aligned_alloc is paired with free and realloc (there is no aligned_free prior to C23 or aligned_realloc in general). The C runtime in particular remembers the original user specified alignment passed into aligned_alloc and preserves that if it needs to allocate a new buffer as part of realloc. For alloc+realloc, it only preserves the system default alignment (typically 16-bytes on 64-bit systems).

If so, what is the maximum alignment that NativeMemory.Realloc would maintain?

It depends on the underlying system. The C runtime doesn't guarantee a range of values that aligned_alloc must support, only that it must be support all fundamental alignments (typically this will be all powers of 2 up to sizeof(void*)). In practice most support at least up to the size of a page and many support larger alignments as well.

Is the only purpose of NativeMemory.AlignedRealloc for situations when you want to change alignment?

Changing alignment isn't strictly guaranteed to work as some underlying realloc functions, such as _aligned_realloc on Windows, require it to match the original alignment passed into the aligned allocation function. It exists to pair with AlignedAlloc and provide a function that will definitively work.

One bit I was trying to say in my previous message was that the .NET team could, with a bit more discussion, simplify our own implementation and just call realloc on Linux, rather than manually doing an aligned_alloc+memcpy+free chain. This would fix the performance issue you're seeing without needing users to rely on implementation details.

Can we include this information in the NativeMemory.Realloc documentation?

I think there's a few clarifying remarks we can add to improve things here, yes. Particularly in terms of what may be undefined behavior across platforms.

jkotas commented 1 day ago

The C runtime in particular remembers the original user specified alignment passed into aligned_alloc and preserves that if it needs to allocate a new buffer as part of realloc.

Is that documented somewhere? I do not see it mentioned in any documentation and it does not appear to be the case based on my ad-hoc testing. For example, this is going to reliable show that realloc does not preserve 64kB alignment on Ubuntu 24.04:

include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

int main()
{
   int blockSize = 65536;

   void* p = aligned_alloc(blockSize, blockSize);
   printf("%p %s\n", p, (((uintptr_t)p % blockSize) == 0) ? "aligned" : "NOT ALIGNED!");

   void* p2 = realloc(p, 2*blockSize);
   printf("%p %s\n", p2, (((uintptr_t)p2 % blockSize) == 0) ? "aligned" : "NOT ALIGNED!");
}

The exact conditions where realloc happens to preserve alignment vary between C runtime flavors (e.g. glibc vs. musl). I do not think that it is something one can reasonably depend on.

tannergooding commented 1 day ago

Is that documented somewhere?

Hmmm, I thought it had been in the C17 or C23 spec; but after having re-read the relevant portions it isn't explicitly called out.

It would indeed be dependent on the underlying implementation given that, which may not preserve it in all cases.