Open steveharter opened 3 years ago
I feel like adding a property called
Offset
onRuntimeFieldHandle
would make more sense here, the issue is though that the language doesn't exposefieldof
today.
Actually, we could add UnsafeAccessor for Field and Method handles then instead to solve the lack of language features for fieldof and methodof and just expose the Offset as an intrinsic.
We were talking the other day about how IL allows overloading fields, and hence how you couldn't have an unsafe accessor returning eg. a ref readonly T
to a type T
having a base conversion from TField
, because then you wouldn't be able to disambiguate which field exactly you're looking for. If you had a field accessor just returning a RuntimeFieldHandle
, wouldn't you hit the same problem if multiple fields of different types with the same name were present in IL? 🤔
We were talking the other day about how IL allows overloading fields, and hence how you couldn't have an unsafe accessor returning eg. a
ref readonly T
to a typeT
having a base conversion fromTField
, because then you wouldn't be able to disambiguate which field exactly you're looking for. If you had a field accessor just returning aRuntimeFieldHandle
, wouldn't you hit the same problem if multiple fields of different types with the same name were present in IL? 🤔
It could be solved by adding a fake parameter for return type just like there is one for declaring type.
Shouldn't RuntimeHelpers.GetRawData be made public alongside the addition of UnsafeAccessorKind.FieldOffset? Otherwise there is no safe way to get the base reference for an arbitrary object. You'd have no way to actually use the offset in that scenario...
Yes, this would need to be thought through if the raw offsets are enabled for classes, and not just structs. Another problem with classes is that field offset is not always available. https://github.com/dotnet/runtime/issues/28001 has related discussion.
Shouldn't RuntimeHelpers.GetRawData be made public alongside the addition of UnsafeAccessorKind.FieldOffset? Otherwise there is no safe way to get the base reference for an arbitrary object. You'd have no way to actually use the offset in that scenario...
Yes, this would need to be thought through if the raw offsets are enabled for classes, and not just structs. Another problem with classes is that field offset is not always available. #28001 has related discussion.
Couldn't we add instead a new intrinsic to Unsafe
class that would make the usage of the offset quite more straightforward:
public static ref T AddByteOffset<T>(object obj, nint offset);
// ldarg.0, ldarg.1, add, ret
public static ref T AddByteOffset<T>(object obj, nint offset); // ldarg.0, ldarg.1, add, ret
I would think it couldn't be implemented like that, since it's invalid IL, but if there was a way to get the reference to field 0 from and object, and we did that after ldarg.0
, it should work.
If we got this API (which I personally would like to get), it would be great if we also got the corresponding inverse API (should both of these be nuint
? I think so personally, but it doesn't really matter to me - just makes sense since negative doesn't make sense for either of these):
public static object? GetObject<T>(ref T reference, out nuint offset);
I would think it couldn't be implemented like that, since it's invalid IL,
Just tested and calling the following code is working with .NET 7 JIT, but I haven't checked the specs in a while for such operation. One unknown I have with CoreCLR JIT/GC is what is seen after the add: is it an object ref, or is it a ref T. If it is the former, that could create GC corruption.
.method public hidebysig static !!T& AddByteOffset<T>(object source, native int byteOffset) cil managed aggressiveinlining
{
.custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = ( 01 00 00 00 )
.maxstack 2
ldarg.0
ldarg.1
add
ret
} // end of method Unsafe::AddByteOffset
"One unknown I have with CoreCLR JIT/GC is what is seen after the add: is it an object ref, or is it a ref T."
As far as I know, the type you declare in C#/IL doesn't matter. As far as the GC is concerned, that is simply "some GC pointer". In this case, it'll fall inside the data of an object, hence it's an interior pointer, so during mark the GC will consider the entire object as reachable, that's it. Shouldn't really cause any problems. In fact, reinterpreting the type of interior pointers (or GC refs in general) is quite common for various reasons.
Couldn't we add instead a new intrinsic to
Unsafe
class that would make the usage of the offset quite more straightforward
An additional benefit of providing ref T Unsafe.AddByteOffset<T>(object obj, nint offset);
is that it makes it relatively similar when dealing with struct field offsetting, difference being that the struct field offsetting would require an additional ref cast.
Just tested and calling the following code is working with .NET 7 JIT, but I haven't checked the specs in a while for such operation.
This is invalid IL. If you run it on checked JIT, you will see all sorts of asserts. The first one is:
Assertion failed 'genActualType(op1->TypeGet()) != TYP_REF && genActualType(op2->TypeGet()) != TYP_REF : Possibly bad IL with CEE_add at offset 0002h (op1=ref op2=long stkDepth=0)' in 'Test:AddByteOffset[ubyte](System.Object,long):byref' during 'Importation' (IL size 4; hash 0xcdd16edf; Tier0)
It is likely that that the JIT can either crash or produce bad code when this gets method inlined into a callsite of a particular shape.
This is invalid IL. If you run it on checked JIT, you will see all sorts of asserts. The first one is:
Fair enough. We definitely don't want invalid IL 😅
So, exposing RuntimeHelpers.GetRawData
would be the way and it would require to shift the offset with the method table in the UnsafeAccessor code in my PR, or could we provide a ref byte Unsafe.AsByteRef(object) { ldarg.0; ret; }
?
I'm ok with RuntimeHelpers.GetRawData
if it is the preferred way.
My opinion is that the following set of APIs makes the most sense:
namespace System.Runtime.CompilerServices
{
public static class RuntimeHelpers
{
public static ref byte GetRawData(object? o);
public static object? GetObject(ref byte reference, out nuint offset);
}
}
This provides the forward and reverse API to convert between byrefs and objects.
The lack of <T>
reduces the number of generic instantiations we will get (if that still matters on some platforms? I can't recall).
Note object?
: since we will need to check null anyway, I think it makes sense to check for null and give a null byref for GetRawData
, instead of throwing, since it should be able to emit better code for that, and for GetObject
it makes sense since reference
could point to unmanaged/stack memory (in which case offset
would equal the pointer).
When the documentation is written for these pair of APIs, we will need to consider their behaviour with strings and arrays - currently they would probably return a ref to the start of the length field - if this is how we want it to work (which probably makes sense), then we should document the offset from the length field to the first entry in these cases so people can use it correctly for these OR we could document that it's undefined to use these APIs and then try to determine/select what specific index it's at (this would allow us to change them to be 64 bit length in the future if needed), which could make sense since there are other APIs for working with these special cases in the "intended" way when indexing is desired.
Linking Roslyn issue https://github.com/dotnet/roslyn/issues/68000 where you can't get an offset of a ref
field.
We now have the PR at https://github.com/dotnet/runtime/pull/93946 which adds UnsafeAccessorKind.FieldOffset
(API issue pending).
But we need a champion to create the API issue for the RuntimeHelpers.GetRawData()
and .GetObject()
proposal above so we can move the discussion there.
But we need a champion to create the API issue for the
RuntimeHelpers.GetRawData()
I simply reopened #28001 since it already has a bunch of discussion and it's exactly that proposal. I cleared the milestone and marked untriaged.
Support
ref struct
and "fast invoke" by adding new reflection APIs. The new APIs will be faster than the currentobject[]
boxing approach by leveraging the existingSystem.TypedReference<T>
.TypedReference
is a special type and is super-fast because since it is aref struct
with its own opcodes. By extending it with this feature, it provides alloc-free, stack-based “boxing” with support for all argument types (reference types, value types, pointers and ref structs [pending]) along with all modifiers (byval, in, out, ref, ref return). Currently reflection does not support passing or invoking aref struct
since it can’t be boxed toobject
; the new APIs are to supportref struct
with new language features currently being investigated.Example syntax (actual TBD):
Dependencies
The Roslyn and runtime dependencies below are required for the programming model above. These are listed in the order in which they need to be implemented.
ref struct
as a generic argument. This would be used when adding the static factory methodpublic static TypedReference CreateFromRefStruct<T>(ref T myRefStruct) where T : ref struct
which basically wraps__makeref(myrefstruct)
. It could also be used to enableSpan<T> where T : ref struct
which is the ideal stack-based collection implementation that can containTypedReference
s.params
. The new invoke APIs require a collection ofTypeReference
s. There are several possible implementations; the most normalized solution would be supportingSpan<T> where T : ref struct
thus enabling anyref struct
(not justTypedReference
) to be used in a container. UPDATE: done in prototype per https://github.com/dotnet/runtime/issues/75349TypedReference
. (Roslyn link TBD). Ideally,TypedReference
is a normalref struct
. Note thatTypedReference
currently hasByReference<byte>
to contain an interior pointer to the value so this will likely need a different type. Also,TypedReference
has several compile-time limitations including not be able to be passed to another method that should be removed resulting in only standardref struct
semantics.[ ] Add reflection support for Span and other ref struct types (this is the library\API issue and requires all of the above work items)
ref
fields.Motivation
Reflection is ~20x slower than a Delegate call for a typical method. Many users including our own libraries use IL Emit instead which is non-trivial and error-prone. The expected gains are ~10x faster with no allocs; verified with a prototype. Internally, IL Emit is used but with a proposed slow-path fallback for AOT (non-emit) cases. The existing reflection invoke APIs may also layer on this.
In Scope
APIs to invoke methods using
TypedReference
including passing aTypedReference
collection.TypedReference
must be treated as a normalref struct
(today it has nuances and special cases).Support ref struct (passing and invoking).
Performance on par with existing ref emit scenarios:
To scope this feature, the minimum functionality that results in a win by allowing
System.Text.Json
to remove its dependency to System.Reflection.Emit for inbox scenarios.Out of Scope
This issue is an incremental improvement of reflection by adding new Invoke APIs and leveraging the existing
TypedReference
while requiring some runtime\Roslyn changes. Longer-term we should consider a more holistic runtime and Roslin support for reflection including JIT intrinsics and\or new "dynamic invoke" opcodes for performance along with perhaps C# auto-stack-boxing to\from aref TypedReference
.Implementation
A design doc is forthcoming.
The implementation will likely cache the generated method on the corresponding
MethodBase
andMemberInfo
objects.100% backwards compat with the existing object[]-based Invoke APIs is not necessary but will be designed with laying in mind (e.g. parameter validation, special types like ReflectionPointer, the
Binder
pattern, CultureInfo for culture-aware methods) so that in theory the existing object[]-based Invoke APIs could layer on this new work.This issue supersedes other reflection performance issues that overlap: