dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.45k stars 4.76k forks source link

Allow ref struct to contain ref fields #32060

Closed Sergio0694 closed 2 years ago

Sergio0694 commented 4 years ago

The current plan is to add ref fields as first class construct to the type system. See proposal at https://github.com/dotnet/csharplang/pull/3936 for more details.


Original proposal

Overview

Make the System.ByReference<T> type (here) public instead of internal.

Rationale

It's already possible to achieve the same behavior of ByReference<T>, namely to store a ref T to some variable as a field in a ref struct, by leveraging the Span<T> and MemoryMarshal types, like so:

public readonly ref struct FakeByReference<T>
{
    private readonly Span<T> span;

    public ByReference(ref T value)
    {
        span = MemoryMarshal.CreateSpan(ref value, 1);
    }

    public ref T Value => ref MemoryMarshal.GetReference(span);
}

This can either be used as is, as a field in some other ref struct type, or the same technique (storing a 1-length Span<T> in place of the missing ByReference<T> type) can be used in other ref struct types to reproduce this same behavior: storing a managed reference as a field in a ref struct type.

Given that this is already possible with existing APIs, as shown above, but it's less performant than the original ByReference<T>, more verbose and more error prone, why not just make the original ByReference<T> type public so everyone can use it with no need to reinvent the wheel?

It'd be a neat API to have, and at basically no cost, since it's already in the .NET Core BCL anyway. πŸ˜„

jkotas commented 4 years ago

We would prefer this to be exposed as ref fields in C# language: https://github.com/dotnet/csharplang/issues/1147 . This would be even faster, leaner, and less error prone than ByReference<T>.

Sergio0694 commented 4 years ago

Hi @jkotas - thanks for linking that issue, I had missed that! 😊

Is there an ETA for that though? By the looks of it, and the fact it has been opened for over two years already, it looks like it still has a way to go, plus all the additional difficulties caused by the fact that that would require at least some Roslyn updates to support that, if not some CLR changes too.

My rationale here was that making the ByReference<T> type public wouldn't allow new scenarios not achievable today, but simply make them more convenient for devs that were interested in using them (with the assumption that those devs would already be aware of all the various implications of using unsafe APIs in that case). Even if a ref T field feature was to be implemented one day, wouldn't this still be a nice and easy to implement stopgap for the time being? Again, given that the same exact feature can still be achieved today anyway, just less efficiently and with more room for mistakes (and the additional verbosity of having to reimplement that fake ByReference<T> type).

jkotas commented 4 years ago

We are here for the long haul and want to do things properly. We try hard to avoid exposing temporary workarounds that we are stuck forever even once the more appropriate solution is available.

Sergio0694 commented 4 years ago

Sure thing, that's fair enough πŸ˜„ Also, just to be clear, it wasn't my intention to suggest this change as a quick and sloppy solution instead of doing things properly, as I said my rationale was more of an "iterative" improvement for a scenario that's technically supported already, but with an even worse implementation. But I can see how it'd be preferable to jump straight to the optimal and built-in solution, absolutely πŸš€

Is there any approximate ETA on that proposal? I see the issue has been open for over two years already and it's not currently assigned to any milestone, not even X.0 or X.X. Is that a feature that's still being evaluated and not yet decided on, or is it something the various teams have already decided to implement, just not with an exact timeframe for it?

Thanks again for your time! 😊

jkotas commented 4 years ago

It did not make it into the list of C# features we are looking at for .NET 5.

DaZombieKiller commented 4 years ago

Wouldn't exposing this type make it possible to create ref fields in languages other than C#? To me that sounds like something worth considering.

jkotas commented 4 years ago

This type as it exists today is not safe construct. It is very easy to create dangling pointers with it, without guardrails in the language. If we were to expose it, we would likely mark it with the special ObsoleteAttribute to block its use in languages that are not aware of it (e.g. similar to how ref structs are blocked).

GrabYourPitchforks commented 4 years ago

On a related note, Jan, you and I previously talked about a syntax that would allow operating with gcrefs as naturally as operating with pointers. Things that would allow natural syntax like the + operator without bouncing through Unsafe.Add. Would something like that ever make sense to add as a language feature, or as a ByRef-like type, or as something else?

Sergio0694 commented 4 years ago

Hey @jkotas - I have a follow up question on this, specifically regarding the general criteria being used to expose APIs or add explicit support for "unsafe" operations. You mentioned how with the ByReference<T> type is't possible to create dangling pointers, which is certainly true. My question though is about the seemingly arbitrary decision not to make the type public, despite the BCL having a ton of potentially dangerous APIs already. Consider this snippet:

public static Span<T> GetFaultySpan<T>()
{
    T value = default;
    return MemoryMarshal.CreateSpan(ref value, 1);
}

This compiles fine, but results in a dangling pointer just like the one you mentioned. It's not perfectly clear to me why is this API public, but ByReference<T> is not. Don't get me wrong, this API is super useful (and I love the fact that it's available!), but I don't get why the same rationale doesn't allow to ByReference<T> as well: "here's a potentially dangerous but useful tool, use at your own risk!".

Or, just like the BCL already includes dedicated code to properly handle devs doing stuff that's technically not valid (eg. Memory<T> wrapping strings as mutable), then why not just use the same rationale here as well, and provide a tool that enables devs to do exactly the same thing they can already do with MemoryMarshal.CreateSpan, but just in a more efficient and less error prone way?

I just can't stress enough how useful the ByReference<T> type would be, and with the native support for ref T fields not making the cut for .NET 5 I'd say it'll be at least another year (if not more) until that eventually makes it to a public release? As you said, you could also mark it as [Obsolete] just like Span<T>, to avoid issues with other .NET languages.

Just though I'd share my thoughts on this, thank you in advance for your time! 😊

P.S. Just so I'm perfectly clear, I'm not criticizing your work in any way, the .NET team is doing an incredible job and I really love where the whole platform is going! Just trying to make my case here as to why I think making ByReference<T> public could be a win-win for everyone. πŸ˜„

tannergooding commented 4 years ago

@jaredpar had some reasons on why this wasn't safe from the language perspective and would be extremely likely to cause bugs without proper language support for tracking the lifetime.

jkotas commented 4 years ago

This compiles fine, but results in a dangling pointer just like the one you mentioned

Methods in System.Runtime.InteropServices are unsafe. They can produce dangling pointers, corrupt memory, etc. You have to know what you are doing when using them and they should be avoided where possible.

extremely likely to cause bugs without proper language support for tracking the lifetime.

Correct. We would like ref fields to have proper language that guarantees its safety.

Sergio0694 commented 4 years ago

Ah, yeah that's a fair point, having ByReference<T> under System.* would definitely be a problem - I hadn't noticed that before and I had just assumed it was under System.Runtime.* somewhere like Unsafe or MemoryMarshal. I guess it could very well be moved there since right now it's internal, but I can see how that would require more work/refactoring than just making it public πŸ‘

And yeah I agree that just having language-level support for ref fields would be even better, absolutely no dispute there. I was just hoping for a better workaround for the time being, seeing as that feature at least for now seems to be pretty far from completion, is all.

Thank you both again for your time, I really appreciate the conversation here!

jaredpar commented 4 years ago

@tannergooding

had some reasons on why this wasn't safe from the language perspective and would be extremely likely to cause bugs without proper language support for tracking the lifetime.

The reasons are pretty straight forward: the lifetime safety rules, as written today, are predicated on such a constructor not existing. It's explicitly called out in the design.

That's not to say we can't do ref fields, just that the current design was written explicitly assuming they didn't exist. The initial designs of ref struct actually had support for them but we removed them because it introduced some heavy complexity into our lifetime rules. Now we think we have a simpler way of doing this and hope to bring ref fields into a future version of C#.

john-h-k commented 4 years ago

+1 on this. While the span workaround, well, works, it isn't ideal. Particularly where you have an interop scenario where you are using ByReference<T> in place of a T* in the native code, and want it to be blittable so it can be passed when pinned

john-h-k commented 4 years ago

On a related note, Jan, you and I previously talked about a syntax that would allow operating with gcrefs as naturally as operating with pointers. Things that would allow natural syntax like the + operator without bouncing through Unsafe.Add. Would something like that ever make sense to add as a language feature, or as a ByRef-like type, or as something else?

Having it actually added to refs would be a huge breaking change wouldn't it. As currently, + refers to the underlying object the ref points to

Sergio0694 commented 4 years ago

Personally I'd already be more than happy just to have the ref field C# feature that was mentioned, especially if not having to also work out a new syntax to allow offsetting over ref-s could make the implementation of these ref fields quicker. Even just with that, we could write something like:

public readonly ref struct Ref<T>
{
    public readonly ref T Value;

    public Ref(ref T value)
    {
        Value = value;
    }

    public ref T this[int i]
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        get => Unsafe.Add(ref Value, i);
    }
}

Which personally I'm very looking forward to, as it'd be a nice and fast abstraction over having to manually use Unsafe.Add every time you need to offset to some location πŸ˜„

john-h-k commented 4 years ago

Personally I'd already be more than happy just to have the ref field C# feature that was mentioned,

well yeah, that is just better, but it is far away, will take a lot of compiler work, new language rules, etc. i just want the dangerous scary thing exposed that if you f#@! up then it doesn't help you

GrabYourPitchforks commented 4 years ago

~~If you reeeeeeeally wanted ByReference<T>, an enterprising individual could write a fully out-of-band implementation. 😁 You'd have to write it in IL instead of C#, but there's nothing stopping anybody from doing it. Said assembly would need to target netcoreapp2.1+ and wouldn't be compatible with netstandard or netfx.

I don't think such an out-of-band type would be blittable or would play well with the interop system. I don't even think the internal in-box type fulfills these characteristics. You'd need to pin the T& to a T* if you wanted it to be blittable.~~

This idea is shot. See below.

jkotas commented 4 years ago

@GrabYourPitchforks I am not sure what you have in mind, but I do not think it would work.

john-h-k commented 4 years ago

@GrabYourPitchforks ByReference<T> is not an S.R.CS.Unsafe kind of type which is written in IL. It is an intrinsic'y one that has a special meaning to the runtime afaik

GrabYourPitchforks commented 4 years ago

@jkotas I thought as of netcoreapp2.1 the runtime itself had support for byref members of structs, even though the C# compiler doesn't? I was thinking you could write a class entirely in IL that looked something like:

// pseudo-code
public ref struct ByReference<T>
{
    private readonly T& _value;
    public ByReference(ref T value) { _value = ref value; }
    public ref T Value => ldfld(_value);
}

The public API surface uses only constructs that the C# compiler can handle. Admittedly I haven't actually tried this. Just spitballing.

jkotas commented 4 years ago

I thought as of netcoreapp2.1 the runtime itself had support for byref members of structs

That is not correct. You should see type load exception if you try to run your example.

john-h-k commented 4 years ago

@GrabYourPitchforks nope, invalid IL. I believe the current way it works is by having a dummy IntPtr in the ByReference<T> type (see here), and specially instructing the runtime that in reality is a tracked ref

Won't even assemble, iirc, @jkotas - FieldSig has no BYREF attribute (unlike LocalVarSig/Param blob) so it can't be encoded.

GrabYourPitchforks commented 4 years ago

Well, there goes that idea. :)

ENikS commented 4 years ago

@jkotas

I been scratching my head for for a very long while trying to create a struct that holds reference to a parent.

I am using it in Unity container to build dependency graphs during resolution. I had to use unsafe code to do so but it places certain limitations on where it could be executed. A solution that doesn’t require unsafe methods would be a dream come through for the project but every time I see potential fix it is either internal, or does not compile.

Perhaps you could recommend something that I missed?

jkotas commented 4 years ago

You have to use unsafe code for that today.

ENikS commented 4 years ago

@jkotas

Any way I could build my own ByReference<>? If I can build entire framework locally, why not just that single type?

NET is full of dangerous code and could be used badly, I am baffled why that particular type became such a guarded 'golden fleece'? For library authors it could be life saver. The closest thing C# could have to C++.

Write "Don't use under no circumstances!" in big letters in documentation and that will scare away 99% of developers, just these who have no other choice will dare to proceed...

DaZombieKiller commented 4 years ago

Any way I could build my own ByReference<>? If I can build entire framework locally, why not just that single type?

There is an example of this in the OP, using a private Span<T> field.

ENikS commented 4 years ago

@DaZombieKiller Thank you, but it is rather expensive to create 3 different types (ByReference, Span, FakeByReference) to hold just one reference.

In the IoC I am working on even difference in one reference field in the context structure brings performance down 15% from 14ns to 16.5ns. Multiplied by couple of million resolutions during startup of application adds up a lot.

jkotas commented 4 years ago

Any way I could build my own ByReference<>?

I would just use unmanaged pointers. Without the language safety guardrails, ByReference<T> and unmanaged pointers are equally unsafe.

Sergio0694 commented 4 years ago

@ENikS I've added a Ref<T> type (and readonly version, and Nullable ones too) to the Microsoft.Toolkit.HighPerformance package, if you're interested. You can find the docs for that here. I also have a PR open (should be merged shortly) that makes it use the actual ByReference<T> type from CoreCLR when running on .NET Core, so you also get exactly the same codegen/performance as using ref T fields in that case (if they actually existed). I find that type does the job quite well working around the current lack of support for this feature in C#.

ENikS commented 4 years ago

@jkotas I can't use pointer because structs contain managed references.

To give you more 'context', when IoC advances from dependency to dependency it creates a local Context on the stack frame. But unfortunately it also require a reference to parent Context living on stack frame as well. The lifetime is guaranteed to be shorter than the parent because parent is always calling method and survives longer.

Implementing this in unsafe code places restrictions on execution in certain environments. I have to use class instance (I have a class and a struct with the same signature) in place of struct to mitigate but doing so degrades performance due to heap allocations. As you can see, I would sell my next unborn child to have fast native support for ByReference

@Sergio0694

Thank you, I will check it out!

jkotas commented 4 years ago

I can't use pointer because structs contain managed references.

You can use System.Runtime.CompilerServices.Unsafe to get around this limitation.

DaZombieKiller commented 4 years ago

That pull request (https://github.com/windows-toolkit/WindowsCommunityToolkit/pull/3367) has me not just impressed but concerned, as it's exposing a type that is (to my knowledge) considered by the .NET team a temporary workaround and may be removed when a more appropriate ref field solution comes along. If several projects start relying on it, then wouldn't that place the removal of ByReference<T> in a bit of a tough spot? Or is this going to be considered along the same lines as reflection (you choose to use it, you choose to accept breakage) for now?

ENikS commented 4 years ago

@jkotas

You can use System.Runtime.CompilerServices.Unsafe to get around this limitation.

I can't! It will not run in some of UWP scenarios. Also, Unity is used by a lot of big financial organizations and some place restrictions on Unsafe code in libraries.

jkotas commented 4 years ago

It will not run in some of UWP scenarios

Then you need a slower implementation for those. Even if the proper support for byref fields gets added to the language and runtime, this support won't be backported to existing runtimes.

Unity is used by a lot of big financial organizations and some place restrictions on Unsafe code in libraries.

There is no type and memory safe way to do byref-fields today. If your requirement is to use type and memory safe code only, you cannot use byref-fields today.

jkotas commented 4 years ago

is this going to be considered along the same lines as reflection (you choose to use it, you choose to accept breakage)

Yes. https://github.com/windows-toolkit/WindowsCommunityToolkit/pull/3367#issuecomment-676557220

ENikS commented 4 years ago

@jkotas You are not helping!

The decision seems to be rather arbitrary as there are workarounds that could provide same behavior but with higher costs. Keeping that type internal does not provide any additional measure of safety, it just prevents proper optimization.

Would it be possible to bring in more decision makers into this discussion to weigh pros and cons? Since net5 release is still some time in the future, we have time to properly discuss it...

I've read a lot of @VSadov comments related to this and other issues, perhaps he can provide any suggestions?

Lets start with a short list of people who could make this decision. @jkotas do you know who are these people, could you provide their names?

jkotas commented 4 years ago

You can find list of area owners at https://github.com/dotnet/runtime/blob/master/docs/area-owners.md

net5 is feature complete. We are switching master branch to net 6. The changes going into net 5 release branch at this point are fit and finish bug fixes only.

ENikS commented 4 years ago

In my view this type being internal is a bug :)

Sergio0694 commented 4 years ago

@ENikS About UWP, if you're tracking references to fields within objects on the heap, you can use the same pattern that is used by both the portable Span<T> type and the Ref<T> type from the HighPerformance package when running on runtimes without runtime support for Span<T> (ie. without the internal ByReference<T> magic type): just have your type contain an object field to the parent, and an IntPtr offset for that field. Then you can get a ref T to that field by just shifting the reference.

You can use the object.DangerousGetObjectDataByteOffset<T>(object) and object.DangerousGetObjectDataReferenceAt<T>(object, IntPtr) extensions (still from that HighPerformance package) to do that. They'll take care of all the weird offsetting for you.

This approach works on even .NET Standard 1.4 (so also on eg. Mono, etc.), it's mostly just slower and it can only work for interior references to objects, so it's not quite as flexible. It does the job though and it's still pretty useful in some cases 😊

ENikS commented 4 years ago

@Sergio0694 Excellent suggestion, thank you

Sergio0694 commented 4 years ago

@ENikS FYI that's actually exactly how Ref<T> works too when running on UWP (or .NET Standard <= 2.0 in general). It offers a Ref<T>(object, ref T) constructor that does exactly that behind the scenes. Just in case you don't want to reimplement that from scratch (or, you can look at the source as a reference).

GrabYourPitchforks commented 4 years ago

The decision seems to be rather arbitrary as there are workarounds that could provide same behavior but with higher costs. Keeping that type internal does not provide any additional measure of safety, it just prevents proper optimization.

I don't think it's arbitrary. https://github.com/dotnet/runtime/issues/32060#issuecomment-596624746 gave some examples of where the compiler's safety rails fall apart were this type to be publicly exposed.

The comments in this thread from the runtime and libraries and language team are all pretty consistent that we'd like to expose this feature the correct way - via proper ref fields - rather than exposing the internal ByRef<T> type. Once that change comes online, the ByRef<T> type will probably be deleted from the runtime. (This also means that the Windows Toolkit PR mentioned earlier will stop working on this new runtime.)

Sergio0694 commented 4 years ago

@GrabYourPitchforks I mentioned future support plans in my comment here, that's one of the reasons why we didn't just expose ByReference<T> directly, but still abstracted it through our Ref<T> types (and related types), to give us more flexibility.

If with eg. .NET 6 (or some other future runtime) the ByReference<T> was removed and proper support for ref T fields was added, we'd just tweak the code accordingly - the HighPerformance package is already multi-targeting a number of different runtimes/profiles specifically to be able to deal with runtime-specific quirks too when needed (eg. we offer fast indexers for arrays on .NET Core that do the unsafe cast through an internal RawArrayData type mapping the array layout on CoreCLR) 😊

VSadov commented 4 years ago

It is a pervasive assumption in GC that byref fields do not exist. That is not going to change. A lot of things are substantially simpler/faster due to this assumption.

ByReference<T> gets around this limitation by colluding with JIT - formally there is no ref field, but JIT reports roots as if there is one.

Basically this type works as expected only with a JIT/AOT compiler that knows about it and only in a few scenarios where JIT has a chance to do its thing. That is enough to implement span types.

Supporting this type more generally will at least require a lot of testing on a range of platforms and commitment to keep that working in the future.

ENikS commented 4 years ago

@GrabYourPitchforks

The protection provided by keeping ByReference internal is easily circumvented as shown in OP. But instead of doing it properly, it is done as a slow hack. In a way it is like an abortion, when outlawed, it is still done, but instead of going to a doctor, it is done by charlatans (no relation to anyone :).

@VSadov The type is being used left and right in framework, I don't know how much more testing should be done.

With a push to use value tapes in .NET it is becoming real impedance when it could not be referenced

jkotas commented 4 years ago

The type is being used left and right in framework,

ByReference<T> is as an implementation detail in 3 low level types in CoreLib (Span, ReadOnlySpan, TypeReference). I would not call it used left and right.

ENikS commented 4 years ago

Considering that Span is being added to each and every high performance API (http, stream, buffer, pipeline, and etc.) I would say it is.

GrabYourPitchforks commented 4 years ago

The protection provided by keeping ByReference internal is easily circumvented as shown in OP.

Keeping things internal isn't meant to be a "protection" per se, so there's nothing to circumvent. It's internal because per the runtime team (of whom many have commented on this thread), we don't feel it's a good type to expose. When we expose a type publicly, at that point we sit down and hash out the full design, including any necessary language support or how the type is exposed by our API surface.

There's lots of other internal functionality that can be accessed via things like private reflection. For the most part we don't go out of our way to block callers from using them - doing that would just lead to an arms race. But we also don't support these scenarios at all, and apps who do this need to be prepared for things for things to break unexpectedly. It would be perfectly valid for the implementation of SomeInternalFrameworkMethod to change from "please multiply two numbers together" on Tuesday to "please reformat my hard drive and delete all my cloud backups" on Wednesday.

The type is being used left and right in framework ... Considering that Span is being added to each and every high performance API (http, stream, buffer, pipeline, and etc.) I would say it is.

I think you've drawn an incorrect conclusion here. You're correct that span is a fundamental building block of high-performance APIs, but I think it's a leap to say that span's internal implementation details are fundamental building blocks. The point of an implementation detail is that we can rip it out and replace it in the future without affecting any callers or consumers. Once ref fields come online and we rewrite spans to use that feature instead of ByRef<T>, then poof! suddenly ByRef<T> has exactly zero direct or indirect consumers. Clearly an implementation detail that can be wholesale replaced without affecting any consumers is not a fundamental building block.