dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.97k stars 4.66k forks source link

Proposal: RuntimeHelpers.GetRawData #28001

Open MichalStrehovsky opened 5 years ago

MichalStrehovsky commented 5 years ago

Proposal

I suggest we add the following method to System.Runtime.CompilerServices.RuntimeHelpers:

namespace System.Runtime.CompilerServices
{
    public static class RuntimeHelpers
    {
        public static ref byte GetRawData(object o);
    }
}

Given an object instance o, this would return a managed reference to the first field of the type (or first element of an array if o is an array, or first character of a string if o is string). This can be implemented the same way JitHelpers.GetPinningHelper is implemented within the CLR and would have similar use.

Alternatives considered

We could place the helper in System.Runtime.CompilerServices.Unsafe, but that would be troublesome for variable-length types (arrays and string). If we ever add new ones (such as the Utf8string prototype), the API might not do what the user expects. This API needs to be versioned with the underlying runtime.

Example Usage

Create a strongly typed reflection-free field getter and setter:

abstract class FastFieldAccessor
{
    public abstract object GetValue(object o);
    public abstract void SetValue(object o);
}

class FastFieldAccessor<T> : FastFieldAccessor
{
    private IntPtr _offset;

    public FastFieldAccessor(object o, ref T field)
    {
        _offset = Unsafe.ByteOffset(ref RuntimeHelpers.GetRawData(o), ref Unsafe.As<T, byte>(ref field));
    }

    public override object GetValue(object o)
    {
        return Unsafe.As<byte, T>(ref Unsafe.Add(ref RuntimeHelpers.GetRawData(o), _offset));
    }

    public override void SetValue(object o)
    {
        Unsafe.As<byte, T>(ref Unsafe.Add(ref RuntimeHelpers.GetRawData(o), _offset)) = (T)o;
    }
}
xoofx commented 5 years ago

Would love to have this actually. Typically, the fact that we don't have access today to JitHelpers.GetRawSzArrayData(this Array array) is annoying because there are many optimizations that CoreCLR can do internally on arrays that we can't do on our own.....

@MichalStrehovsky, what would be the story to calculate the offset field? I'm not a fan of the convoluted code we have to go through today like this:

    class MyClass
    {
        public string field;
        public int field2;
    }

    class MyClassOffset
    {
        private static readonly MyClass Temp = new MyClass(); 
        public static readonly int OffsetField2 = Unsafe.ByteOffset(ref Unsafe.As<string, int>(ref Temp.field), ref Temp.field2).ToInt32();
    }

Wondering if as part of the CLR change required for interface static methods, if we should not introduce more IL opcodes for these kind of scenarios... 🤔

xoofx commented 5 years ago

Wondering if as part of the CLR change required for interface static methods, if we should not introduce more IL opcodes for these kind of scenarios...

Thinking more seriously about this, I'm worried that we could miss an opportunity here (we don't have the occasion to have breaking changes to the IL), so should we think more broadly about what would be relevant to bring to these CLR changes? @jkotas what do you think?

jkotas commented 5 years ago

I think that GetRawData(object o) is too low-level as public API with unclear semantics around arrays, etc. I would be ok with ref char GetRawData(string s) or ref T GetRawData<T>(T[] s).

The fast field accessors that are shown in example usage should be first class public APIs exposed by the runtime. I believe we have some related issues about this. I do not think we want everybody to be implementing their own version of them from low-level building blocks. We want them to be owned by the runtime so that we can optimize the heck out of them, without compromising type safety.

breaking changes to the IL

I do not think this proposal requires any changes to the IL. Also, note that interface static methods are not a breaking change. It would be a breaking change if existing programs stopped working which is not the case. The fact that new features require new runtimes is not a breaking change.

xoofx commented 5 years ago

I do not think this proposal requires any changes to the IL.

Yeah, I was not asking about this issue in particular. For example, calculating the offset of a field from a root object is "quite common" for low level access so a fld.offset opcode would be actually useful, while today we have to go through lots of convoluted unsafe API to get something like this....

jkotas commented 5 years ago

For example, calculating the offset of a field from a root object is quite common

What are the scenarios you need the offsets for? I would rather look at what you are trying to do and see what is the best way to address it. Offsets are way too low-level.

xoofx commented 5 years ago

What are the scenarios you need the offsets for?

For example: an animation engine, instead of having full codegen for accessing fields of structured data (which incurs an indirect call and hurts instruction cache in the end with too many of them) you can use instead field offsets to manipulate the data efficiently. Today, we need to use the code I showed above to calculate the offset, which incurs to allocate a temporary object to calculate this offset and calling multiple methods from Unsafe, while this offset is accessible from the JIT/AOT compiler relatively easily. In C# this could translate to something like offsetof(MyStruct.field2) with a direct opcode for it...

jkotas commented 5 years ago

an animation engine, instead of having full codegen for accessing fields of structured data

Would the type safe field accessors work for this? I would like to have a type safe default that would be used in most cases, and have unsafe escape hatch for cases where the type safety check is too expensive.

xoofx commented 5 years ago

Would the type safe field accessors work for this?

Not sure I know this, do you have an example how does it work?

jkotas commented 5 years ago

E.g. We can introduce a fully type safe field accessor like this:

struct FieldAccessor<TType, TField> where TType : class
{
    int _offset;

    public ref TField GetValue(TType obj)
    {
        return ref *(obj + _offset); // pseudo code 
    }
}
xoofx commented 5 years ago

E.g. We can introduce a fully type safe field accessor like this:

Oh right, this is what I would use in the end once I have the offset.

My concern is more for the calculation of _offset. Today, we don't have an IL instruction for calculating it. We have to go through something like Unsafe.ByteOffset(ref Unsafe.As<string, int>(ref Temp.field), ref Temp.field2).ToInt32(); (and create a temp instance of MyStruct for example) while I'm looking for something like offsetof(MyStruct.field2)

jkotas commented 5 years ago

There would be a factory method to get the FieldAccessor from the FieldInfo or RuntimeFieldHandle. The factory would be a slower method - it would validate the types.

xoofx commented 5 years ago

There would be a factory method to get the FieldAccessor from the FieldInfo or RuntimeFieldHandle. The factory would be a slower method - it would validate the types.

As long as we can avoid using reflection (and allocating lots of managed memory) to fetch this token, I'm fine with this approach. I assume that we would have proper support in Roslyn to do a ldtoken via something like handleof(xxx) as described in the intrinsics support proposal?

jkotas commented 5 years ago

Right.

MichalStrehovsky commented 5 years ago

I assume that we would have proper support in Roslyn to do a ldtoken

And for a cross-cutting feature like this, we can look forward to having it available in .NET Core 5 or 6 :).

I'm proposing to expose a low level building block like this to address the needs people have right now. It's similar to the function pointer support that C# is adding - they could have waited for static delegates, but they want to address problems people are having now, as opposed to selling them on how great things will be 3 years from now.

ML.net is using Reflection.Emit because of what appears to be a lack of building blocks: https://github.com/dotnet/machinelearning/issues/1736

xoofx commented 5 years ago

And for a cross-cutting feature like this, we can look forward to having it available in .NET Core 5 or 6 :).

But I thought that handleof is so simple (and similar to nameof) that it should not be much trouble to bring this to Roslyn. What is the alternative if we don't have support baked into Roslyn? (considering Reflection.Emit is a no go)

jkotas commented 5 years ago

I'm proposing to expose a low level building block like this to address the needs people have right now

So what would ML.NET do with the low-level GetRawData(object o) ? They would still need to use hacky unsafe code to compute the field offset or hardcode it. If they were ok with doing that, they can also write a 5 lines more of it and add their own our hacky local implementation of GetRawData(object o) too. We do not need to do anything. They have all the low-level unsafe tools they need.

We would not need to wait for handleof to expose the type-safe field accessory. Getting the reflection object to create the type-safe field accessor from would not be as efficient or straightforward as handleof, but that's fine. It would still be a lot more efficient and straightforward compared to Reflection.Emit solutions frequently used to solve this problem today.

they could have waited for static delegates

Not really. It would be ugly for static delegates to solve all the problems (e.g. around interop) that the function pointers are solving.

GrabYourPitchforks commented 5 years ago

I opened https://github.com/dotnet/corefx/issues/36133 to track the SzArray-specific helper method.

mjsabby commented 5 years ago

@MichalStrehovsky We can get this in S.P.C first right? This and also GetObjectSize and HasReferences.

MichalStrehovsky commented 5 years ago

@MichalStrehovsky We can get this in S.P.C first right? This and also GetObjectSize and HasReferences.

Yes, exposing this as a private API on RuntimeHelpers for frozen objects serializer should work. It feels like a better place for it than JitHelpers.

Sergio0694 commented 4 years ago

Following up this conversation with @MichalStrehovsky, it'd be nice if we also had an API just like GetArrayDataReference, but for 2D arrays. I mean, ideally it'd be nice to have a built-in way to do that for ND arrays in general, but having that API for at least 2D arrays would be very useful 👍

I know multidimensional arrays in general are not used that often, but I do think they could use some more love from the new APIs (Span<T> and Memory<T> in particular).

Partially related: it'd be nice if MemoryExtensions also had an AsSpan API for 2D arrays. At least that can be worked around already by using MemoryMarshal.CreateSpan(ref T, int), but having a built-in API to achieve the same would make it easier to do for developers not familiar with that class.

I can create issues for these various points, but I thought quickly running this by you guys to get some early feedback could be useful to avoid creating many (potentially unnecessary) issues 😄

BreyerW commented 4 years ago

Just wanted to make sure this wont work with struct s at least not in-place correct? Im quite interested in reflection free field acessor but it seems with current impl of GetRawData reflection will be still necesssary when mutating struct in-place right?

Or do you expect some trick/api/pattern to work well in this specific case when mutating struct in-place-like?

MichalStrehovsky commented 4 years ago

Or do you expect some trick/api/pattern to work well in this specific case when mutating struct in-place-like?

If you want to modify struct in-place, there's TypedReference.MakeTypedReference that allows you to pass an object and FieldInfos that specify a "path" to a field. Once you have a TypedReference, you would do __refvalue(myRef, X) = value; to set it (X is the type of the field, and value is the value to assign. __refvalue is an undocumented C# keyword to manipulatedTypedReference.

BreyerW commented 4 years ago

@MichalStrehovsky according to documentation on Typed it wont work with non-primitive structs eg any struct that contain array or reference type in general at least if i understand docu correctly. We are still screwed in such case right?

Setting that aside i wonder if it would be possible to introduce an overload like public static ref byte GetRawData<T>(ref T object) where T: struct or something similar to make it smoother to work with struct (it could use typedreference under the hood maybe? But then unmanaged constraint would likely be required and that wouldnt cover 100% cases)

ericstj commented 4 years ago

Given the timeline for 5.0 and the state of this issue, I don't see it making in. Moving to 6.0.

steveharter commented 3 years ago

Closing; assuming https://github.com/dotnet/runtime/issues/45152 supersedes this.

MichalStrehovsky commented 10 months ago

Reopening per https://github.com/dotnet/runtime/issues/45152#issuecomment-1781439989.

steveharter commented 10 months ago

The suggestion at https://github.com/dotnet/runtime/issues/45152#issuecomment-1779969381 also had a way to return an object. Assuming we want to do that with this PR, I suggest we'll also want to:

MichalStrehovsky commented 10 months ago

The suggestion at #45152 (comment) also had a way to return an object. Assuming we want to do that with this PR, I suggest we'll also want to:

I don't know what's the use case for such API - do we have a use for something like this in CoreLib? We certainly have uses for GetRawData. I don't know a place where we'd do inverse. Unless there's a good use case, I'd not muddy this proposal with that.

steveharter commented 10 months ago

A reference implementation of a GetObject() exists in TypedReference today.

MichalStrehovsky commented 10 months ago

A reference implementation of a GetObject() exists in TypedReference today.

Looking at this again, https://github.com/dotnet/runtime/issues/45152#issuecomment-1779969381 and https://github.com/dotnet/runtime/issues/28001#issuecomment-1782981372 each propose a very different API.

How does https://github.com/dotnet/runtime/issues/28001#issuecomment-1782981372 differ from Unsafe.As? Is there a difference between these two?

Foo x = Unsafe.GetObject<Foo>(ref myRef, 123);

Foo x = Unsafe.As<byte, Foo>(ref Unsafe.AddByteOffset(ref myRef, 123));