dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.43k stars 4.76k forks source link

[API Proposal]: RuntimeHelpers.Box to create a box around a dynamically-typed byref #97341

Closed jkoritzinsky closed 7 months ago

jkoritzinsky commented 10 months ago

Background and motivation

In interop scenarios where we have unbounded generic APIs that can take blittable or non-blittable types, we can end up in a scenario where we have a generic type T that has a corresponding type TAbi. However, there is no way to represent TAbi as a generic argument, so we have a mechanism to retrieve it as a System.Type instance. This allows us to still reason about the type.

If we want to do anything with the type though (like copy values around, go from a pointer into a buffer to a managed representation of it), we end up needing to use the APIs on System.Runtime.InteropServices.Marshal that take a System.Type, which are slow and not AOT-compatible.

We hit this scenario in CsWinRT when we are trying to marshal from WinRT a T[] where T is non-blittable. Today CsWinRT uses the APIs on Marshal even though the type will always be blittable as there's no alternative APIs.

API Proposal

namespace System.Runtime.CompilerServices;

public static class RuntimeHelpers
{
+    public object? Box(ref byte target, RuntimeTypeHandle type);
}

API Usage

int length = ...;
IntPtr data = ...;
if (data == IntPtr.Zero)
{
    return null;
}
var array = new T[length];
var data = (byte*)data.ToPointer();
var abi_element_size = Marshal.SizeOf(AbiType); // #97344 API would go here to get rid of the remaining usage of Marshal.SizeOf
for (int i = 0; i < abi.length; i++)
{
    array[i] = Marshaler<T>.FromAbi(RuntimeHelpers.Box(ref *data, AbiType.TypeHandle));
    data += abi_element_size;
}
return array;

Alternative Designs

Mentioned below in the collapsable section.

Risks

Minimal risk.

Previous Proposal ### API Proposal ```diff namespace System; public ref struct TypedReference { + public TypedReference(ref byte target, RuntimeTypeHandle type); } ``` ### API Usage ```csharp int length = ...; IntPtr data = ...; if (data == IntPtr.Zero) { return null; } var array = new T[length]; var data = (byte*)data.ToPointer(); var abi_element_size = Marshal.SizeOf(AbiType); // #97344 API would go here to get rid of the remaining usage of Marshal.SizeOf for (int i = 0; i < abi.length; i++) { var abi_element_ref = new TypedReference(ref *data, AbiType.TypeHandle); array[i] = Marshaler.FromAbi(abi_element_ref.ToObject()); data += abi_element_size; } return array; ``` ### Alternative Designs We could provide another API on `RuntimeHelpers` to read and box an unmanaged value-type from a buffer, or even provide a higher level API over a `ReadOnlySpan`, like a `MemoryMarshal.Read(ref ReadOnlySpan, System.Type)` method. We could make this method a static method with an `Unsafe` suffix to further describe its unsafeness. ### Risks `System.TypedReference` is a rarely used type. There may be issues in the runtime implementations around supporting it.
ghost commented 10 months ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation In interop scenarios where we have unbounded generic APIs that can take blittable or non-blittable types, we can end up in a scenario where we have a generic type `T` that has a corresponding type `TAbi`. However, there is no way to represent `TAbi` as a generic argument, so we have a mechanism to retrieve it as a `System.Type` instance. This allows us to still reason about the type. If we want to do any thing with the type though (like copy values around, go from a pointer into a buffer to a managed representation of it), we end up needing to use the APIs on `System.Runtime.InteropServices.Marshal` that take a `System.Type`, which are slow and not AOT-compatible. We hit this scenario [in CsWinRT](https://github.com/microsoft/CsWinRT/blob/149203556fd257dff6e7b66616bd3e29541345d7/src/WinRT.Runtime/Marshalers.cs#L681-L719) when we are trying to marshal from WinRT a `T[]` where `T` is non-blittable. Today CsWinRT uses the APIs on Marshal even though the type will always be blittable as there's no alternative APIs. .NET already has a mechanism to represent a byref with a dynamically-determined type: `System.TypedReference`. However, there's no way to construct one currently from a byref and a type. This API proposal provides a constructor to create such a `TypedReference`. ### API Proposal ```diff namespace System; public ref struct TypedReference { + public TypedReference(ref byte target, RuntimeTypeHandle type); } ``` ### API Usage ```csharp int length = ...; IntPtr data = ...; if (data == IntPtr.Zero) { return null; } var array = new T[length]; var data = (byte*)data.ToPointer(); var abi_element_size = Marshal.SizeOf(AbiType); // TODO: another API proposal to remove this usage of Marshal.SizeOf for (int i = 0; i < abi.length; i++) { var abi_element_ref = new TypedReference(ref *data, AbiType.TypeHandle); array[i] = Marshaler.FromAbi(abi_element_ref.ToObject()); data += abi_element_size; } return array; ``` ### Alternative Designs We could provide another API on `RuntimeHelpers` to read and box an unmanaged value-type from a buffer, or even provide a higher level API over a `ReadOnlySpan`, like a `MemoryMarshal.Read(ref ReadOnlySpan, System.Type)` method. We could make this method a static method with an `Unsafe` suffix to further describe its unsafeness. ### Risks `System.TypedReference` is a rarely used type. There may be issues in the runtime implementations around supporting it.
Author: jkoritzinsky
Assignees: -
Labels: `api-suggestion`, `area-System.Runtime`
Milestone: -
jkotas commented 10 months ago

We could make this method a static method with an Unsafe suffix to further describe its unsafeness.

We typically deal with this problem by adding the method somewhere under System.Runtime.InteropServices or System.Runtime.CompilerServices namespace (for example, MemoryMarshal.CreateReadOnlySpan).

Should we introduce unsafe getter for the byref as well while we are on it?

Sergio0694 commented 10 months ago

"Should we introduce unsafe getter for the byref as well while we are on it?"

Would that be the same as __refvalue? As in, would the idea just be that it'd be documented (but no new functionality)?

"We typically deal with this problem by adding the method somewhere under System.Runtime.InteropServices or System.Runtime.CompilerServices namespace (for example, MemoryMarshal.CreateReadOnlySpan)."

Something like this?

namespace System.Runtime.CompilerServices;

public static class RuntimeHelpers
{
    public static TypedReference CreateTypedReferenceUnsafe(ref byte target, RuntimeTypeHandle type);
    public static ref byte GetValueRefUnsafe(TypedReference reference);
}
Sergio0694 commented 10 months ago

Random idea: would it perhaps make sense to add a TypedReferenceMarshal class instead? To kinda group the methods and do what we already did for eg. CollectionsMarshal, ImmutableCollectionsMarshal?

Ie.

namespace System.Runtime.InteropServices;

public static class TypedReferenceMarshal
{
    public static TypedReference CreateTypedReference(ref byte target, RuntimeTypeHandle type);
    public static ref byte GetValueRef(TypedReference reference);
}
jkotas commented 10 months ago

"Should we introduce unsafe getter for the byref as well while we are on it?"

Would that be the same as __refvalue? As in, would the idea just be that it'd be documented (but no new functionality)?

It would not be the same. __refvalue is safe. It takes the target type statically and validates it. We have #26186 about adding safe ref accessors on TypedRefence. We may want to look at them together with this one to ensure consistency.

public static TypedReference CreateTypedReferenceUnsafe(ref byte target, RuntimeTypeHandle type); public static ref byte GetValueRefUnsafe(TypedReference reference);

We do not use Unsafe suffixes in unsafe namespaces.

would it perhaps make sense to add a TypedReferenceMarshal class instead?

This is a question for API review discussion. When there is a choice, we typically avoid introducing a new type with just a few members and add the methods to the existing type instead. For example, MemoryMarshal would be a fine place for these two methods. There are two overloads of MemoryMarshal.GetArrayDataReference and these methods are very similar.

MichalStrehovsky commented 10 months ago

Unfortunately this API will require Roslyn work first - we cannot have APIs that return TypedReference due to the type's special snowflake status within the compiler (any method that returns TypedReference runs into CS1599).

Sergio0694 commented 10 months ago

Well that's quite unfortunate and seems like it'd make this feature much less cheap than expected 🥲 Could we just sidestep the whole issue and just add an API for what we actually need to do? That is:

namespace System.Runtime.CompilerServices;

public static class RuntimeHelpers
{
    public static object? CreateObjectUnsafe(ref byte target, RuntimeTypeHandle type);
}

Like, the whole reason we needed a TypedReference was only so we could then use TypedReference.ToObject on it, but especially if that complicates things and needs language support too, perhaps we could just not surface it in our API shape at all?

AaronRobinsonMSFT commented 10 months ago

Could we just sidestep the whole issue and just add an API for what we actually need to do?

Is the expectation here that target is an existing block of memory that contains a previously allocated managed object? The example above makes it seem like one has a blob of raw bytes and wants it massaged into a managed object (class or value type), is that the case?

Sergio0694 commented 10 months ago

"The example above makes it seem like one has a blob of raw bytes and wants it massaged into a managed object (class or value type), is that the case?"

Yup that's correct! The pointer points to some memory (managed or native, could be either) holding the data for a given ABI-type-for-T value. We need to read that and create a boxed object of type ABI-type-for-T, which we can then pass to that Marshaler<T>.FromAbi method, which will convert that to the right T value. Those FromAbi will usually just do something like (ABI-type-for-T)box and then create a T instance from it with the right (generated) marshalling between the two,

So eg. in the example Jeremy posted, we'd use the API like this:

for (int i = 0; i < abi.length; i++)
{
    array[i] = Marshaler<T>.FromAbi(RuntimeHelpers.CreateObjectUnsafe(ref *data, AbiType.TypeHandle);
    data += abi_element_size;
}
AaronRobinsonMSFT commented 10 months ago

olding the data for a given ABI-type-for-T value.

Alright. Based on this statement then, my next assumption would be the memory itself could either be unmanaged (on the native heap) or managed (on the GC heap, pinned of course), but contained no managed references. For example, the following would be an ABI type:

// Unmanaged according to C# language rules
struct ABIType
{
    public int* Data;
    public int Len;
}

Where as the following would not:

// Managed according to C# language rules
struct NotABIType
{
    public int[] Data;
}
jkotas commented 10 months ago

I would expect RuntimeHelpers.CreateObjectUnsafe to behave exactly same as the IL box instruction, except that the type is specified as an argument instead of statically. It has very similar shape as #97344. #97344 is about providing API that behaves exactly same as the IL sizeof instruction, except that the type is specified as an argument instead of statically.

Sergio0694 commented 10 months ago

"my next assumption would be the memory itself could either be unmanaged (on the native heap) or managed (on the GC heap, pinned of course), but contained no managed references."

Yup! All the ABI types are either handcrafted in CsWinRT (eg. for Type, TimeSpan, DateTimeOffset, etc.) or produced by the CsWinRT generator, and they'll always be blittable. We just need to create an object of this ABI type given a pointer to its raw data, in an AOT-safe way.

Sergio0694 commented 10 months ago

"It has very similar shape as https://github.com/dotnet/runtime/issues/97344."

Exactly. Ultimatley we'd like to end up with something like this in CsWinRT, being fully AOT-safe:

var array = new T[length];
var data = (byte*)data.ToPointer();
var abi_element_size = RuntimeHelpers.SizeOf(AbiType.TypeHandle); // 97344
for (int i = 0; i < abi.length; i++)
{
    array[i] = Marshaler<T>.FromAbi(RuntimeHelpers.CreateObjectUnsafe(ref *data, AbiType.TypeHandle); // 97341
    data += abi_element_size;
}
Sergio0694 commented 10 months ago

I've opened https://github.com/microsoft/CsWinRT/pull/1463 to enable IsAotCompatible in CsWinRT and fix all AOT warnings, and for now I've just had to always throw NotSupportedException in all of these array marshalling methods for non blittable types. The idea would be that with both this and #97344, we'd be able to make those paths functional again in .NET 9.

jkotas commented 10 months ago

It should be called Box since that's what it is doing.

Sergio0694 commented 10 months ago

Makes sense. And I assume the unsafe is already implied like you said.

So, updated API proposal:

namespace System.Runtime.CompilerServices;

public static class RuntimeHelpers
{
    public static object? Box(ref byte target, RuntimeTypeHandle type);
}
AaronRobinsonMSFT commented 10 months ago

ref byte target

I would also like to see if we can add a max length. It seems we should be able to compute that is most cases, no?

Sergio0694 commented 10 months ago

I would assume once we also got #97344, callers would have to ensure they do have that size available at the memory area pointed at by target? 🤔

Also: should we make that ref readonly byte?

jkotas commented 10 months ago

I would also like to see if we can add a max length. It seems we should be able to compute that is most cases, no?

In most case, the callers would just to call RuntimeHelpers.SizeOf proposed in the other issue to make the API argument validation happy. I do not think that max length argument makes sense.

AaronRobinsonMSFT commented 10 months ago

I would also like to see if we can add a max length. It seems we should be able to compute that is most cases, no?

In most case, the callers would just to call RuntimeHelpers.SizeOf proposed in the other issue to make the API argument validation happy. I do not think that max length argument makes sense.

That is a different problem. I am looking at this from the interop perspective. In the proposed use case the input is coming from a native context via a raw pointer. My assumption here is that pointer, when passed to native code, may also come with a length or perhaps some other byte size component. My desire here would be to pass that along not necessarily as a "this is the correct size" but instead as a "the max length to consider is X". This is similar to some native APIs where a max length to consider is provided with a pointer. It doesn't mean the end will be at that length, but it does provide the notion of "if you read past this, you've gone too far".

Sergio0694 commented 10 months ago

This kinda makes me wonder: if the implementation of the method is going to have to validate the number of bytes actually being read against that additional "max number of bytes" parameter, would it make sense to have this method also return some out int bytesRead value, which would also mean (at least for this scenario) we wouldn't even need #97344 anymore? 🤔

AaronRobinsonMSFT commented 10 months ago

would it make sense to have this method also return some out int bytesRead value

That seems reasonable to me and aligns with the desired validation from both the caller and callee I would appreciate.

The managed object size is an community request though so I don't think this API can replaced that.

Sergio0694 commented 10 months ago

"The managed object size is an community request though"

Just to clarify, are you referring to some other proposal other than #97344? Because if not, Jeremy opened that one as part of the same scenario in CsWinRT, so if eg. we made the API from this proposal also return the number of bytes read, we would have no need for the other one anymore.

jkotas commented 10 months ago

That is a different problem. I am looking at this from the interop perspective.

This API is not interop specific. I am looking at this API as a primitive building block that you can use to build interpreters. The CsWinRT use case is replacing a static generic code specialization by dynamic code specialization. It is effectively manually written universal shared generic code - it is slower, but one instance of the code can handle all specializations.

AaronRobinsonMSFT commented 10 months ago

Just to clarify, are you referring to some other proposal other than https://github.com/dotnet/runtime/issues/97344?

The notion of managed object size has been around for a long time. See issue https://github.com/dotnet/runtime/issues/24200. If I recall, there was a COM interface in .NET Framework that I think gave you details on instance size. I would consider https://github.com/dotnet/runtime/issues/97344 as a special case of the idea.

I am looking at this API as a primitive building block that you can use to build interpreters.

You and your interpreter examples :) Yeah, I get that. Okay, I accept the proposed shape is a better fit in that case. I retract the size argument suggestion.

jkoritzinsky commented 10 months ago

My general concern with using object? here is that it makes doing the opposite operation, "Given an object that is boxed around some dynamic type T, write it to a ref", not as simple as "implement TypedReference.SetObjectReference.

However, I talked with Aaron and he allayed my concerns. We could implement this with a new "UnboxInto" API like the new Box API or just wait until we need the API and revisit it.

I'll change the API proposal to the proposed Box API.

lambdageek commented 10 months ago

If the method is going to take a pointer and a length, should the API then be:


namespace System.Runtime.CompilerServices;

public static class RuntimeHelpers
{
+    public static object? Box(ReadOnlySpan<byte> target, RuntimeTypeHandle type);
}
AaronRobinsonMSFT commented 10 months ago

If the method is going to take a pointer and a length, should the API then be:

Yep, but I wanted to discuss the length first prior to going down the span abstraction. I think the length argument has been settled so it isn't needed here.

lambdageek commented 10 months ago

Yep, but I wanted to discuss the length first prior to going down the span abstraction. I think the length argument has been settled so it isn't needed here.

I'm unclear on why the length argument is settled - if anything the interpreter use case is another reason to have extra validation. The API is bringing together two unrelated arguments: a chunk of memory and a type descriptor and essentially doing a memcpy of a number of bytes (determined by the type) from the chunk of memory. There's no reason to believe the chunk of memory is big enough.

jkotas commented 10 months ago

This is very unsafe low-level API. Validating the length is not going to make this API safe, but it is going to make it slower and more cumbersome to use.

For example, consider how something like TypedReference.ToObject would be implemented using this API. TypedReference has a reference to the memory block and the type. The memory block is assumed to be valid for the given type. Validating the length would not provide any benefit in this case.

If you want to validate the memory size for your case, you can do that by building a helper method that checks the length using API proposed in https://github.com/dotnet/runtime/issues/97344 and then calls this one.

tannergooding commented 9 months ago

Is this ready to be marked api-ready-for-review? It looks like it, but I just want to confirm based on the discussion above.

bartonjs commented 7 months ago

Video

namespace System.Runtime.CompilerServices;

public static partial class RuntimeHelpers
{
     public object? Box(ref byte target, RuntimeTypeHandle type);
}