dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

[API Proposal]: GC.RegisterObjectAssociatedMemory #83721

Open DaZombieKiller opened 1 year ago

DaZombieKiller commented 1 year ago

Background and motivation

When working with buffers today, it can be difficult to abstract away unmanaged memory due to the inherent complexity in managing its lifetime.

For example, consider the simplified API surface of the following type:

public sealed class LargeArray<T>
{
    public LargeArray(nuint length);
    public nuint Length { get; }
    public ref T this[nuint index] { get; }
    public Span<T> AsSpan(nuint start, int length);
    public bool TryGetArray([NotNullWhen(true)] out T[]? array);
}

When length is greater than Array.MaxLength and IsReferenceOrContainsReferences<T> is false, the constructor will allocate some native memory using a SafeBuffer. Otherwise, a T[] will be allocated.

There is currently no way for ref T this[nuint index] or AsSpan to be implemented safely for the SafeBuffer case, as the refs will not keep the buffer alive. Without a manual GC.KeepAlive on the LargeArray<T> object, the memory can be freed while it is still in use.

GC.RegisterObjectAssociatedMemory would allow the SafeBuffer to be associated with the memory it owns, so that a ref to that region of memory will keep it alive similarly to the behavior of a ref pointing to managed memory. It may also allow some of the safety issues associated with MemoryManager<T>-backed Memory<T> instances to be solved gracefully.

API Proposal

namespace System;

public static partial class GC
{
    public static unsafe void RegisterObjectAssociatedMemory(object obj, void* pointer, nuint length);
}

API Usage

var span = new SafeNativeMemoryBuffer(100).AsSpan();
// the span will keep the SafeNativeMemoryBuffer alive
internal sealed unsafe class SafeNativeMemoryBuffer : SafeBuffer
{
    public SafeNativeMemoryBuffer(int length)
        : base(ownsHandle: true)
    {
        Initialize((uint)length);
        SetHandle((nint)NativeMemory.Alloc((uint)length));
        GC.RegisterObjectAssociatedMemory(this, (void*)handle, (uint)length);
    }

    public Span<byte> AsSpan()
    {
        var span = new Span<byte>((void*)handle, (int)(uint)ByteLength);

        // Ensure we don't get collected before the span is created
        GC.KeepAlive(this);

        return span;
    }

    protected override bool ReleaseHandle()
    {
        NativeMemory.Free((void*)handle);
        return true;
    }
}

Alternative Designs

It may make more sense to expose the method on RuntimeHelpers rather than GC:

namespace System.Runtime.CompilerServices;

public static partial class RuntimeHelpers
{
    public static unsafe void RegisterObjectAssociatedMemory(object obj, void* pointer, nuint length);
}

Risks

Depending on the work required to implement this feature, it could represent a significant complication in the GC's reference tracking implementation that outweighs the benefits offered by it.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation When working with buffers today, it can be difficult to abstract away unmanaged memory due to the inherent complexity in managing its lifetime. For example, consider the simplified API surface of the following type: ```cs public sealed class LargeArray { public LargeArray(nuint length); public nuint Length { get; } public ref T this[nuint index] { get; } public Span AsSpan(nuint start, int length); public bool TryGetArray([NotNullWhen(true)] out T[]? array); } ``` When `length` is greater than `Array.MaxLength` and `IsReferenceOrContainsReferences` is `false`, the constructor will allocate some native memory using a `SafeBuffer`. Otherwise, a `T[]` will be allocated. There is currently no way for `ref T this[nuint index]` or `AsSpan` to be implemented safely for the `SafeBuffer` case, as the `ref`s will not keep the buffer alive. Without a manual `GC.KeepAlive` on the `LargeArray` object, the memory can be freed while it is still in use. `GC.RegisterObjectAssociatedMemory` would allow the `SafeBuffer` to be associated with the memory it owns, so that a `ref` to that region of memory will keep it alive similarly to the behavior of a `ref` pointing to managed memory. It may also allow some of the safety issues associated with `MemoryManager`-backed `Memory` instances to be solved gracefully. ### API Proposal ```csharp namespace System; public static partial class GC { public static void RegisterObjectAssociatedMemory(object obj, void* pointer, nuint length); } ``` ### API Usage ```csharp var span = new SafeNativeMemoryBuffer(100).AsSpan(); // the span will keep the SafeNativeMemoryBuffer alive ``` ```csharp internal sealed unsafe class SafeNativeMemoryBuffer : SafeBuffer { public SafeNativeMemoryBuffer(int length) : base(ownsHandle: true) { _length = length; SetHandle((nint)NativeMemory.Alloc((uint)length)); GC.RegisterObjectAssociatedMemory(this, (void*)handle, (uint)length); } public Span AsSpan() { var span = new Span((void*)handle, (int)(uint)ByteLength); // Ensure we don't get collected before the span is created GC.KeepAlive(this); return span; } protected override bool ReleaseHandle() { NativeMemory.Free((void*)handle); return true; } } ``` ### Alternative Designs It may make more sense to expose the method on `RuntimeHelpers` rather than `GC`: ```csharp namespace System.Runtime.CompilerServices; public static partial class RuntimeHelpers { public static void RegisterObjectAssociatedMemory(object obj, void* pointer, nuint length); } ``` ### Risks Depending on the work required to implement this feature, it could represent a significant complication in the GC's reference tracking implementation that outweighs the benefits offered by it.
Author: DaZombieKiller
Assignees: -
Labels: `api-suggestion`, `area-System.Memory`, `untriaged`
Milestone: -
ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation When working with buffers today, it can be difficult to abstract away unmanaged memory due to the inherent complexity in managing its lifetime. For example, consider the simplified API surface of the following type: ```cs public sealed class LargeArray { public LargeArray(nuint length); public nuint Length { get; } public ref T this[nuint index] { get; } public Span AsSpan(nuint start, int length); public bool TryGetArray([NotNullWhen(true)] out T[]? array); } ``` When `length` is greater than `Array.MaxLength` and `IsReferenceOrContainsReferences` is `false`, the constructor will allocate some native memory using a `SafeBuffer`. Otherwise, a `T[]` will be allocated. There is currently no way for `ref T this[nuint index]` or `AsSpan` to be implemented safely for the `SafeBuffer` case, as the `ref`s will not keep the buffer alive. Without a manual `GC.KeepAlive` on the `LargeArray` object, the memory can be freed while it is still in use. `GC.RegisterObjectAssociatedMemory` would allow the `SafeBuffer` to be associated with the memory it owns, so that a `ref` to that region of memory will keep it alive similarly to the behavior of a `ref` pointing to managed memory. It may also allow some of the safety issues associated with `MemoryManager`-backed `Memory` instances to be solved gracefully. ### API Proposal ```csharp namespace System; public static partial class GC { public static void RegisterObjectAssociatedMemory(object obj, void* pointer, nuint length); } ``` ### API Usage ```csharp var span = new SafeNativeMemoryBuffer(100).AsSpan(); // the span will keep the SafeNativeMemoryBuffer alive ``` ```csharp internal sealed unsafe class SafeNativeMemoryBuffer : SafeBuffer { public SafeNativeMemoryBuffer(int length) : base(ownsHandle: true) { _length = length; SetHandle((nint)NativeMemory.Alloc((uint)length)); GC.RegisterObjectAssociatedMemory(this, (void*)handle, (uint)length); } public Span AsSpan() { var span = new Span((void*)handle, (int)(uint)ByteLength); // Ensure we don't get collected before the span is created GC.KeepAlive(this); return span; } protected override bool ReleaseHandle() { NativeMemory.Free((void*)handle); return true; } } ``` ### Alternative Designs It may make more sense to expose the method on `RuntimeHelpers` rather than `GC`: ```csharp namespace System.Runtime.CompilerServices; public static partial class RuntimeHelpers { public static void RegisterObjectAssociatedMemory(object obj, void* pointer, nuint length); } ``` ### Risks Depending on the work required to implement this feature, it could represent a significant complication in the GC's reference tracking implementation that outweighs the benefits offered by it.
Author: DaZombieKiller
Assignees: -
Labels: `api-suggestion`, `area-System.Runtime`, `untriaged`
Milestone: -
tannergooding commented 1 year ago

CC. @dotnet/gc since the proposal is for System.GC

Sergio0694 commented 1 year ago

"GC.RegisterObjectAssociatedMemory would allow the SafeBuffer to be associated with the memory it owns, so that a ref to that region of memory will keep it alive similarly to the behavior of a ref pointing to managed memory. It may also allow some of the safety issues associated with MemoryManager-backed Memory instances to be solved gracefully."

That's something that I personally find extremely interesting about this proposal. We've discussed at length before (in the #allow-unsafe-blocks channel in the C# Discord) how Memory<T> is fundamentally an unsafe-equivalent type which due to its unfortunate API design makes it so that it's deceivingly safe for the consumer, as the second you get a value backed by a MemoryManager<T> with a finalizer, things are not great. So much so that the docs outright say not to have a finalizable memory manager, with the guidance being "better to leak memory than to AV". Which, like, maybe, but neither is great.

With an API like this, we could finally make the following pattern safe:

void ConsumeMemory(Memory<byte> memory)
{
    Span<byte> span = memory.Span;

    ConsumeSpan(span);
}

This is currently not safe in the case of a finalizable memory manager, because the GC might collect the memory and the memory manager after the user has retrieved a span, which would cause the native memory the span is working on to be swept from under its feet, and cause an AV. You could instead have a finalizable MemoryManager<T> use RegisterObjectAssociatedMemory so that the GC can keep it alive even if the code is only using a span that's pointing to that native buffer being returned from it. This would essentially allow us to potentially remove the "don't have a finalizable memory manager" restriction entirely.

Seems pretty good on paper? 😄

Disclaimer: I have no idea how feasible and/or complicated this would be to implement for the GC.

Sergio0694 commented 1 year ago

Related to the above, this could also be used by WinRT wrappers for types such as IBuffer to provide safe APIs to get a Span<byte> to their underlying storage which would also ensure the RCW is kept alive as long as they're used 👀

That is, you could have a safe IBuffer.AsSpan() extension (eg. in CsWinRT) that after getting the IBufferByteAccess interface called RegisterObjectAssociatedMemory on the RCW before returning the span, making that much safer than today 🙂

jkotas commented 1 year ago

GC.RegisterObjectAssociatedMemory would allow the SafeBuffer to be associated with the memory it owns, so that a ref to that region of memory will keep it alive similarly to the behavior of a ref pointing to managed memory

SafeBuffer has ref-counted lifetime. What is going to prevent some other thread calling Release on the SafeBuffer and releasing the memory while the current thread is operating on the buffer?

It seems that the proposed pattern would be safe only if the lifetime of the buffer is exclusively managed by finalizers (ie the only place that can release the buffer is the object finalizer, the buffer can never be released explicitly e.g. by Dispose pattern). Managing lifetimes by depending on finalizers exclusively is very high-overhead pattern. It is only acceptable for expensive entities, for example it is used to manage lifetimes of DynamicMethod.

jkotas commented 1 year ago

For buffers specifically, allocating managed regular arrays and depending on GC collecting it is cheaper than allocating unmanaged memory and depending on finalizers to release it.

DaZombieKiller commented 1 year ago

SafeBuffer has ref-counted lifetime. What is going to prevent some other thread calling Release on the SafeBuffer and releasing the memory while the current thread is operating on the buffer?

This is a good point, with no way to prematurely unregister the memory, you could end up in a scenario where you have a ref to freed memory that is still keeping the object alive (which might now be unrelated, if that memory has been reclaimed by another allocation).

If RegisterObjectAssociatedMemory returned some kind of weak GC handle-like type, that would mitigate the issue:

internal sealed unsafe class SafeNativeMemoryBuffer : SafeBuffer
{
+   private ObjectAssociatedMemoryHandle _memoryHandle;

    public SafeNativeMemoryBuffer(int length)
        : base(ownsHandle: true)
    {
        Initialize((uint)length);
        SetHandle((nint)NativeMemory.Alloc((uint)length));
-       GC.RegisterObjectAssociatedMemory(this, (void*)handle, (uint)length);
+       _memoryHandle = GC.RegisterObjectAssociatedMemory(this, (void*)handle, (uint)length);
    }

    public Span<byte> AsSpan()
    {
        var span = new Span<byte>((void*)handle, (int)(uint)ByteLength);

        // Ensure we don't get collected before the span is created
        GC.KeepAlive(this);

        return span;
    }

    protected override bool ReleaseHandle()
    {
+       _memoryHandle.Dispose();
        NativeMemory.Free((void*)handle);
        return true;
    }
}
Sergio0694 commented 1 year ago

Wouldn't that add even more complexity to an API that would still leave your type to not completely be "safe" still? 🥲

DaZombieKiller commented 1 year ago

Wouldn't that add even more complexity to an API that would still leave your type to not completely be "safe" still? 🥲

If the consumer is able to manually free the memory I'm not sure if it would be possible to make it completely safe (since you could always have a dangling reference), though it would still prevent premature collection (and therefore also prevent premature frees). It'd be an overall improvement in safety, though depending on the complexity of the feature's implementation it may not be worth it.

Sergio0694 commented 1 year ago

Right. Yeah I can see how that'd be a "net improvement in safety" here though not perfect. To me it seems like the API would be nice to have mostly for cases like Jan mentioned where you already have an object whose lifetime is 100% determined by the finalizer (eg. in my example, an IBuffer RCW), though I guess that's also a bit of a niche scenario and probably wouldn't justify the cost given in those cases you can still manually just add GC.KeepAlive.

The API would be "nice to have" but I get the feeling it'd be way too complicated and expensive to have than it's worth 😄

jkotas commented 1 year ago

RegisterObjectAssociatedMemory returned some kind of weak GC handle-like type, that would mitigate the issue:

I do not see how this would help. What would __memoryHandle.Dispose do when there is active Span on some other thread? It is not really possible for the thread that is calling __memoryHandle.Dispose to efficiently synchronize with the thread that has active Span.

FWIW, the only known way to solve the unmanaged (non-GC collected) memory lifetime issues is teaching the language about lifetimes, ala Rust borrow checker.

DaZombieKiller commented 1 year ago

_memoryHandle.Dispose would only remove the association between the memory and the object, so it would only prevent the object from being kept alive after the memory has been freed. It wouldn't prevent the span from being invalidated (solving that would require something akin to lifetimes as you mentioned). This proposal is moreso about preventing a finalizer from prematurely freeing the memory rather than accounting for a manual Dispose.