Open PavelCibulka opened 5 days ago
Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.
Tagging subscribers to this area: @dotnet/interop-contrib See info in area-owners.md if you want to be subscribed.
For the most part NativeMemory
APIs are just thin wrappers over the underlying C runtime
For example:
NativeMemory.Alloc
- thin wrapper over malloc
NativeMemory.AllocZeroed
- thin wrapper over calloc
NativeMemory.Free
- thin wrapper over free
NativeMemory.Realloc
- thin wrapper over realloc
Its a little bit more flexible with the aligned variants, as C didn't standardize them until more recently:
NativeMemory.AlignedAlloc
- thin wrapper over aligned_alloc
(if available) or _aligned_malloc
(Win32) -or- equivalent API on non Win32 systems without the C APINativeMemory.AlignedFree
- thin wrapper over free
(if using aligned_alloc
) or _aligned_free
(Win32) -or- equivalent API that pairs with the aligned alloc API on non Win32 systems without the C APINativeMemory.AlignedRealloc
then ends up deferring to _aligned_realloc
on Win32. But there is no equivalent C API and while if the memory was allocated with the C API you can use realloc
, it doesn't allow changing the alignment (it should preserve it however, as its meant to be aware of that scenario). It's also worth noting that while realloc
can theoretically grow an existing allocation and avoid the copy, that's fairly uncommon in practice and is dependent on many other factors. In many cases it functionally is malloc+copy+free
and that is correspondingly what the underlying fallback implementation does on systems where aligned_alloc
is a thing.
Notably "changing" the alignment is technically undefined behavior for the C API and it is technically possible for us to ignore the input and use realloc
for the underlying implementation in that scenario, which might improve the performance on non Windows systems. But such a change likely needs deeper discussion.
Thank you for the very detailed information.
If I understand correctly:
If so, what is the maximum alignment that NativeMemory.Realloc would maintain? Would it be 64 bytes, system page size, or another value?
Can we include this information in the NativeMemory.Realloc documentation?
Is the only purpose of NativeMemory.AlignedRealloc for situations when you want to change alignment?
Change the size using NativeMemory.Realloc (alignment will be preserved even if allocated with NativeMemory.Alloc).
From a general public contract point for .NET, the NativeMemory
APIs Alloc
/Realloc
/Free
are guaranteed to work together and AlignedAlloc
/AlignedRealloc
/AlignedFree
are guaranteed to work together. It is not guaranteed that other mixes work, such as Realloc
/Free
with AlignedAlloc
will work. Mixing APIs can therefore lead to undefined behavior.
The reason this nuance exists is because in some scenarios, like if we used certain POSIX APIs or on Windows where we need to defer to _aligned_malloc
, they are strictly incompatible with the C runtime APIs realloc
/free
and can only be used with the corresponding native APIs (such as _aligned_realloc
/_aligned_free
).
While we currently use the underlying C runtime API on systems that provide it (currently all officially supported Linux systems), we don't surface that detail publicly and so there's no way to query it. If such a detail was surfaced (or you were willing to rely on a point in time implementation detail) then mixing NativeMemory.Realloc
+ NativeMemory.AlignedAlloc
is safe on those specific scenarios due to the underlying guarantees of the C runtime itself, which is that aligned_alloc
is paired with free
and realloc
(there is no aligned_free
prior to C23 or aligned_realloc
in general). The C runtime in particular remembers the original user specified alignment passed into aligned_alloc
and preserves that if it needs to allocate a new buffer as part of realloc
. For alloc
+realloc
, it only preserves the system default alignment (typically 16-bytes on 64-bit systems).
If so, what is the maximum alignment that NativeMemory.Realloc would maintain?
It depends on the underlying system. The C runtime doesn't guarantee a range of values that aligned_alloc
must support, only that it must be support all fundamental alignments (typically this will be all powers of 2 up to sizeof(void*)
). In practice most support at least up to the size of a page and many support larger alignments as well.
Is the only purpose of NativeMemory.AlignedRealloc for situations when you want to change alignment?
Changing alignment isn't strictly guaranteed to work as some underlying realloc
functions, such as _aligned_realloc
on Windows, require it to match the original alignment passed into the aligned allocation function. It exists to pair with AlignedAlloc
and provide a function that will definitively work.
One bit I was trying to say in my previous message was that the .NET team could, with a bit more discussion, simplify our own implementation and just call realloc
on Linux, rather than manually doing an aligned_alloc
+memcpy
+free
chain. This would fix the performance issue you're seeing without needing users to rely on implementation details.
Can we include this information in the NativeMemory.Realloc documentation?
I think there's a few clarifying remarks we can add to improve things here, yes. Particularly in terms of what may be undefined behavior across platforms.
The C runtime in particular remembers the original user specified alignment passed into aligned_alloc and preserves that if it needs to allocate a new buffer as part of realloc.
Is that documented somewhere? I do not see it mentioned in any documentation and it does not appear to be the case based on my ad-hoc testing. For example, this is going to reliable show that realloc does not preserve 64kB alignment on Ubuntu 24.04:
include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
int main()
{
int blockSize = 65536;
void* p = aligned_alloc(blockSize, blockSize);
printf("%p %s\n", p, (((uintptr_t)p % blockSize) == 0) ? "aligned" : "NOT ALIGNED!");
void* p2 = realloc(p, 2*blockSize);
printf("%p %s\n", p2, (((uintptr_t)p2 % blockSize) == 0) ? "aligned" : "NOT ALIGNED!");
}
The exact conditions where realloc
happens to preserve alignment vary between C runtime flavors (e.g. glibc vs. musl). I do not think that it is something one can reasonably depend on.
Is that documented somewhere?
Hmmm, I thought it had been in the C17 or C23 spec; but after having re-read the relevant portions it isn't explicitly called out.
It would indeed be dependent on the underlying implementation given that, which may not preserve it in all cases.
I've been experimenting with resizing allocated aligned memory. I believe that increasing or decreasing memory by multiples of the system page size should be almost instantaneous.
The system seems capable of this when tested with NativeMemory.Realloc, which completes in around 1ms. However, NativeMemory.Realloc doesn't guarantee alignment preservation.
When I perform the same test with NativeMemory.AlignedRealloc, it takes several seconds to complete. It should be as fast as NativeMemory.Realloc when the requested alignment remains unchanged and the memory is resized by multiples of the system page size.
I'm unsure whether this issue should be reported to the .NET team or the operating system developers. I'm using Ubuntu 24.04 with kernel 6.10.14.