dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.01k stars 4.03k forks source link

Proposal: Ref Returns and Locals #118

Closed stephentoub closed 8 years ago

stephentoub commented 9 years ago

(Note: this proposal was briefly discussed in dotnet/roslyn#98, the C# design notes for Jan 21, 2015. It has not been updated based on the discussion that's already occurred on that thread.)

Background

Since the first release of C#, the language has supported passing parameters by reference using the 'ref' keyword, This is built on top of direct support in the runtime for passing parameters by reference.

Problem

Interestingly, that support in the CLR is actually a more general mechanism for passing around safe references to heap memory and stack locations; that could be used to implement support for ref return values and ref locals, but C# historically has not provided any mechanism for doing this in safe code. Instead, developers that want to pass around structured blocks of memory are often forced to do so with pointers to pinned memory, which is both unsafe and often inefficient.

Solution: ref returns

The language should support the ability to declare ref locals and ref return values. We could, for example, now declare a function like the following, which not only accepts 'ref' parameters but which also has a ref return value:

public static ref TValue Choose<TValue>(
    Func<bool> condition, ref TValue left, ref TValue right)
{
    return condition() ? ref left : ref right;
}

With a method like that, one can now write code that passes two values by reference, with one of them being returned based on some condition:

Matrix3D left = …, right = …;
Choose(chooser, ref left, ref right).M20 = 1.0;

Based on the function that gets passed in here, a reference to either 'left' or 'right' will be returned, and the M20 field of it will be set. Since we’re trading in references, the value contained in either 'left' or 'right' is updated, rather than a temporary copy being updated, and rather than needing to pass around big structures, necessitating big copies.

If we don't want the returned reference to be writable, we could apply 'readonly' just as we were able to do earlier with ‘ref’ on parameters (extending the proposal mentioned in dotnet/roslyn#115 to also support return refs):

public static readonly ref TValue Choose<TValue>(
    Func<bool> condition, ref TValue left, ref TValue right)
{
    return condition() ? ref left : ref right;
}
…
Matrix3D left = …, right = …;
Choose(chooser, ref left, ref right) = new Matrix3D(...); // Error: returned reference is read-only

Note that when referencing the 'left' and 'right' ref arguments in the Choose method’s implementation, we used the 'ref' keyword. This would be required by the language, just as it’s required to use the ‘ref’ keyword when passing a value to a 'ref' parameter.

Solution: ref locals

Once you have the ability to receive 'ref' parameters and to return ‘ref’ return values, it’s very handy to be able to define 'ref' locals as well. A 'ref' local can be set to anything that’s safe to return as a 'ref' return, which includes references to variables on the heap, 'ref' parameters, 'ref' values returned from a call to another method where all 'ref' arguments to that method were safe to return, and other 'ref' locals.

public static ref int Max(ref int first, ref int second, ref int third)
{
    ref int max = first > second ? ref first : ref second;
    return max > third ? ref max : ref third;
}
…
int a = 1, b = 2, c = 3;
Max(ref a, ref b, ref c) = 4;
Debug.Assert(a == 1); // true
Debug.Assert(b == 2); // true
Debug.Assert(c == 4); // true

We could also use ‘readonly’ with ref on locals (again, see dotnet/roslyn#115), to ensure that the ref variables don’t change. This would work not only with ref parameters, but also with ref locals and ref returns:

public static readonly ref int Max(
    readonly ref int first, readonly ref int second, readonly ref int third)
{
    readonly ref int max = first > second ? ref first : ref second;
    return max > third ? ref max : ref third;
}
MouseProducedGames commented 8 years ago

GameDev often involves initializing large, structured data to some default values. Therefore, I suggest the following pattern for consideration:

ref largeData[i] = initData;

This avoids the otherwise needless (in this scenario):

ref data = largeData[i];
data = initData;

Granted, it only saves an extra line. But that extra line really doesn't have a reason to exist, and being able to assign directly to a reference to a location in an array, list or etc would bring it in line with assigning to a value location in an array and so on.

If it's too much work, it can certainly wait, of course; it's far from critical.

svick commented 8 years ago

@MouseProducedGames I don't understand, how is that different from just largeData[i] = initData;?

MouseProducedGames commented 8 years ago

@svick: It's an assignment to a ref, not a copy-and-assign to value.

svick commented 8 years ago

@MouseProducedGames Are you saying that largeData is an array of refs? Because I don't think that's proposed here and that CLR can't support that.

If it's not an array of refs, just a normal array of structs, then you do need to copy the data.

MouseProducedGames commented 8 years ago

@svick You're missing the point, and misunderstanding it.

This:

ref data = largeData[i];
data = initData;

Is getting a reference to a location in an array and assigning data to that reference.

Likewise, this: ref largeData[i] = initData Is also getting a reference to a location in an array and assigning data to that reference. Just without one extra, and unnecessary, line.

The bottom example is simply replicating the code in the top example, only without the need for an explicit ref variable.

HaloFour commented 8 years ago

@MouseProducedGames

That wouldn't be possible unless the array was itself of refs or pointers. You can't avoid the copy, so largeData[i] = initData is as good as it gets.

MouseProducedGames commented 8 years ago

@HaloFour Would it help if I gave you the equivalent C++ syntax?

Here: (&values[i]) = initData;

I guarantee, this works exactly as I've actually stated.

And no, you don't need an array of refs to get a reference to a location in an array.

MouseProducedGames commented 8 years ago

...Oh, I see the problem. You've latched on to this statement: "It's an assignment to a ref, not a copy-and-assign to value."

Look, that was in response to this statement: "@MouseProducedGames I don't understand, how is that different from just largeData[i] = initData;?"

And assigning to an indexer often has an extra copy into the setter of the indexer. Since you can ref return from a "this[index]" indexer, this avoids copying into the setter of the indexer.

You instead deal with the ref directly.

Does that help?

svick commented 8 years ago

@MouseProducedGames So it's not about arrays, only about custom collections? For those, I believe that just having ref-returning getter (and no setter) would achieve what you want, no need for additional syntax.

HaloFour commented 8 years ago

@MouseProducedGames

Ah, so largeData is some type with a custom indexer property, not an array? That's where I think we were getting it confused.

While @svick does mention using a "readonly" property with a ref return as the type I am curious if C# would allow treating that as if it were a settable property. Otherwise you would probably need to assign the ref to a local before you then assigned the value to the address stored in that local. And you're right that this is what the C# compiler would have to emit anyway.

MouseProducedGames commented 8 years ago

@svick There's two threads of conversation here. One is about shortening this:

ref data = largeData[i];
// Do something with data.

to this: ref largeData[i] // Do something with data without the need to copy to an explicit variable, when you don't need an explicit variable;

The other is clarifying that assigning to ref can save you a copy to setter, to explain one reason why you'd want to operate on the ref directly.

Edit: Sorry for the confusion.

MouseProducedGames commented 8 years ago

Another reason for using a ref directly would be something like this:

ref data = largeData[i];
data.oneFloat = 5f;

Which is a bit needlessly long-winded compared to: ref largeData[i].oneFloat = 5f;

svick commented 8 years ago

@HaloFour The original proposal above seems to imply this would be allowed (in the context of methods, but I assume it would work on indexers too).

svick commented 8 years ago

@MouseProducedGames I still think you don't need additional syntax for that. If largeData[i] is a ref-returning indexer with only a getter, then largeData[i] = someValue; and largeData[i].oneFloat = 5f; will do what you want. There is no need for the ref largeData[i] syntax.

MouseProducedGames commented 8 years ago

@svick: Ah, I misunderstand your point. I thought you were talking about returning a copy from a getter, which left me quite confused.

However, if there's both a copy getter and a ref getter (for example, if the implementer wanted to make the usage explicit) then ref values[i] could make it explicit which one you're using. Other than that, though, I think you're right on that one, given that clarification.

MouseProducedGames commented 8 years ago

@svick Final update, I think. Looks like the devs came to the same syntax you did; I think I was tripped up by trying to apply the same sort of syntax as I would in C++.

Namely:

using System;

namespace TossCSharp7b
{
    class Program
    {
        static void Main(string[] args)
        {
            LargeData[] largeData = new LargeData[3];

            largeData[1].floatValue = 5f;

            foreach (var data in largeData)
                Console.WriteLine(data);

            Console.ReadKey(true);
        }
    }

    struct LargeData
    {
        public double doubleValue;
        public int intValue;
        public float floatValue;
        public char charValue;
        public bool boolValue;

        public override string ToString()
        {
            return string.Format("double: {0}, int: {1}, float: {2}, char: {3}, bool: {4}", doubleValue, intValue, floatValue, charValue, boolValue);
        }
    }
}

Output: double: 0, int: 0, float: 0, char: , bool: False double: 0, int: 0, float: 5, char: , bool: False double: 0, int: 0, float: 0, char: , bool: False

Overall, perhaps allowing a "ref" qualifier, even if pointless, might ease a transition from C++; OTOH, the above format is exactly what I tried to do with array variables when I first learned C# with, IIRC, either v1.1 or v2. Probably v1.1 on XP.

orthoxerox commented 8 years ago

A ref-getter is a very welcome feature. It will help things like array slices work exactly like real arrays. Of course, something like IRefList<T> will be required to represent this behaviour.

fsacer commented 8 years ago

Why doesn't the code example work in VS '15' preview. It looks like ternary operator doesn't support it. Im also interested in how to reassign reference variable (ref locals) like in the following code which doesn't compile:

public static ref int Max(ref int first, ref int second, ref int third) {
    ref int max = ref first;
    if (second > max) {
        ref max = ref second;
    }
    if (third > max) {
        ref max = ref third;
    }
    return ref max;
}
IS4Code commented 8 years ago

Seeing as readonly refs are basically controlled-immutability managed pointers, the following construct may be possible:

public static readonly ref Unbox<T>(object inst) where T : struct
{
    return ref (T)inst;
}

Returning an object interior reference. Also reference list and enumerables would be a nice addition.

VSadov commented 8 years ago

@fsacer ref reassignments are not allowed in the current implementation. The main reasons for that are:

ref max = ref second;

or

max = ref second;
vcsjones commented 8 years ago

How does this impact platform invocation, if at all? For example, could I do the following?

[DllImport("foo.dll")]
static extern ref int Foo();

With the expectation that this is a "safe" way of doing static extern unsafe int* Foo()?

IS4Code commented 8 years ago

That's not safer in any way. By choosing "ref int" instead of "int*" you will allow the GC to know of the pointer, but it still wouldn't be allowed to do anything about it, due to it being located outside the managed memory (assuming the pointer was created there in the first place).

On the other hand, how would reflection handle ref returns? Maybe using boxable TypedReferences? ☺

HaloFour commented 8 years ago

@IllidanS4

On the other hand, how would reflection handle ref returns?

Don't have the bits in front of me to test, but I'll assume that it's the same as ref parameters. The Type of the return parameter would be a byref type, meaning Type.IsByRef would return true and Type.GetElementType() would return System.Int32. The CLR has always supported declaring variables and the like as byref types, it's just never been exposed to C#.

IS4Code commented 8 years ago

@HaloFour Of course the return type could be represented, but what if I wanted to invoke the method dynamically? Would it simply return the value in the variable?

HaloFour commented 8 years ago

@IllidanS4

Of course the return type could be represented, but what if I wanted to invoke the method dynamically? Would it simply return the value in the variable?

Looks like the CLR doesn't support it, at least with .NET 4.6.1:

System.NotSupportedException was unhandled:
ByRef return value not supported in reflection invocation.

The only way that I could imagine the CLR supporting this would be to automatically dereference the ref and then box the result.

JohnyL commented 8 years ago

@stephentoub In your original example with Choose function, how would I get to know which Matrix3D object had been changed?

BreyerW commented 8 years ago

As previously said ref properties wont have setter because ref getter already do this (and pure ref property with setter might lead to some subtle bugs), which might be slighty misleading at first glance.

In this case I wondering: readonly (implicitly or explicitly) ref getter with normal (or ref if possible) setter wouldnt be better then (or at least make readonly ref getter as optional which will enable normal/ref setter)?

If i understand correctly readonly should ensure that there will be no subtle bugs because you wont be able to modify ref on get. When compiler see that you are trying to modify property then resolve to setter if present and possible in current context otherwise emit error (since getter will be readonly there shouldnt be question what we want to call).

There was example that byref getter with byval/byref setter might be ambigious, but with readonly getter this problem probably disappear:

class Obj{
       private string description="test";
       public ref string Description{
                        readonly get{ return ref description;}
                        set{ 
OnPropertyChange(this, nameof(description), value);
description=value;
    }
  }
}
var obj=new Obj();
obj.Description = "aaa"; //since getter is readonly ref we cannot use getter we have to resolve to setter

I mention this because sometime there is need to not copy value field (rather get ref to value field) AND be able to fire event on property change reliably.

dotnetchris commented 8 years ago

Please never allow foo.GetByRef("x") = 42; This code reads that you're somehow assigning 42 as if GetByRef was a Func being assigned (s) => 42

bbarry commented 8 years ago

The strange unicode quotes aren't valid syntax, but allowing ref returns to be lvalues is kind of exactly the point here. From the first comment:

int a = 1, b = 2, c = 3;
Max(ref a, ref b, ref c) = 4;
Debug.Assert(a == 1); // true
Debug.Assert(b == 2); // true
Debug.Assert(c == 4); // true

If I were writing say a video game, or some other set of algorithms that required I went out of my way to avoid GC (such as a web server) being able to write code like this can be very useful.

We aren't looking to make this kind of thing pretty, merely for it to be better performing than what you have to currently do (either add another method override to do something and the IL instructions to add the new parameters, or mess around with pointers in C# unsafe code, or write the CIL yourself without the help of the C# language).

Thaina commented 8 years ago

I think some people here have too many enthusiastic on immutability. But it defeat the purpose of this request in the first place, which is performance to control value of struct (performance is reason why we have struct in the first place)

If you want your struct to be immutable you should explicitly use readonly or const

Also I would suggest that in addition to ref return and ref local, It would be useful if we have const return, local and at parameter

int[] numbers = new int[]{0,1,2};
public const int ConstValue(const int index)
{
    return const numbers[index]
}
public ref int RefValue(const int index)
{
    return ref numbers[index]
}

And const parameter will pass by ref but cannot write any code to modified it, Which is useful if we have large immutable struct

svick commented 8 years ago

@Thaina

What's the point of passing int by const reference? Or is this intended to be used for large structs?

Also, how would this be represented in IL? AFAIK, CLR has no support for constant managed references, so this would have to be enforced by the C# compiler, which is not ideal.

Thaina commented 8 years ago

@svick Yes it intend to be used on larger struct (Matrix4x4 for example). just use int as sample

And I think it fine to just enforce on compiler if we can't add anymore feature to CLR. But ideally would be best if it added into CLR

dusho commented 8 years ago

will this allow ugliness like ref (ref (ref xx)) = GetData(); also I'm not sure I like syntax where evidently ref is used but is not explicitly visible in immediate code, like mentioned foo.GetByRef("x") = 42 would it be possible to have some ref there as well. Or is there some way to maybe use something like 'as ref' e.g.

ref int myInteger = 45 as ref;
// =
var myInteger = 45 as ref;

similarly

var data = GetData() as ref;
data[4] = 5;
DemiMarie commented 8 years ago

As far as mutable structs are concerned: They are the ONLY way to get large mutable arrays with good locality, period. When it comes to performance, locality of reference often dominates all else.

Thaina commented 8 years ago

Just curious that. Would it possible for Task and await/async to return ref after resolve?

HaloFour commented 8 years ago

@Thaina The CLR doesn't permit using byref types as generic type arguments. Attempts to use a type like Task<ref int> would produce a BadImageFormatException.

Thaina commented 8 years ago

@HaloFour Could we solve the problem by introduce TaskRef when we implement ref return and let async/await select Task or TaskRef based on the signature of function

HaloFour commented 8 years ago

@Thaina Maybe. Given the limitations enforced by the compiler as to what you can take a ref to I kind of doubt that there would be a safe way to implement an async method that returns a ref, though.

bbarry commented 8 years ago

A ref result on a tasklike would not be fitting the pattern for await. See section 7.7.7.3 of the specification (I suppose it could be changed):

• Either immediately after (if b was true), or upon later invocation of the resumption delegate (if b was false), the expression (a).GetResult() is evaluated. If it returns a value, that value is the result of the await-expression. Otherwise the result is nothing.

I am not sure what a ref tasklike would accomplish. The resumption delegate is evaluated with a different callstack so the ref must already be in the heap somewhere, why not return the instance of whatever object is holding the ref?

edit: fooling around on Try Roslyn it seems apparent that a ref tasklike would make a copy of the value anyway to store it in a backing location (which seems obvious to me but perhaps I'm missing something).

edit 2: I don't think that code should emit CS0649. Does it still do so in a newer build and/or is there an issue entered?

Thaina commented 8 years ago

@bbarry Because what we want to return maybe valuetype The async may do the work of loading and initialize large struct. And after it finish it will just return that struct. So the struct may alive in something on heap but we don't want to let client have access to that class. We want to return that value but as reference for performance reason

struct UserData { /* Very Large detailed information */ }

class UserManager
{
    UserData[] array;
    public static async ref UserData LoadOrGet(int n)
    {
        // First find value in array, return ref array[n] if exist
        // await for load from server, set to array[n + 1] and return after loaded
    }
}
bbarry commented 8 years ago

I see I was thinking about the problem wrong.

bbarry commented 8 years ago

@Thaina, you could return the index to the array or some other handle but it gets old really fast since the implementation doesn't seem to permit ref locals inside async state machines at all. So you are left with:

struct UserData
{ /* Very Large detailed information */ 
}

class UserManager
{
    UserData[] array;
    //idea: can return index and work with that as a handle to the data
    public static async Task<int> LoadOrGet(int n)
    {
        await Task.Delay(1);
        return n;
    }

    public ref UserData Get(int n)
    {
        return ref array[n];
    }

    public static UserManager Instance {get; set; } //mock singleton
}

public class Foo
{
    public async Task Bar()
    {
        int n = await UserManager.LoadOrGet(0);
        //doesn't compile:
        //{
        //  ref UserData d = ref UserManager.Instance.Get(n);
        //  ...
        //}
        //instead:
        SomeWork(n);
        await Task.Delay(1);
        MoreWork(n);
    }

    void SomeWork(int n)
    {
        ref UserData d = ref UserManager.Instance.Get(n);
        //...
    }

    void MoreWork(int n)
    {
        ref UserData d = ref UserManager.Instance.Get(n);
        //...
    }
}

Please do reconsider CS8932 at least when the ref local does not spill across awaits.

svick commented 8 years ago

@Thaina I think combining async with ref returns won't be possible. At first, I was considering that a design that is tied to arrays and indexes could work:

class ArrayRefTask<T>
{
    private T[] array;
    private int index;

    private bool isCompleted;
    private Action continuation;

    public Awaiter GetAwaiter() => new Awaiter(this);

    public void SetResult(T[] array, int index)
    {
        this.array = array;
        this.index = index;
        isCompleted = true;
        continuation?.Invoke();
    }

    public class Awaiter : INotifyCompletion
    {
        private ArrayRefTask<T> myRefTask;

        public Awaiter(ArrayRefTask<T> myRefTask)
        {
            this.myRefTask = myRefTask;
        }

        public bool IsCompleted => myRefTask.isCompleted;

        // this is the important part:
        public ref T GetResult() => ref myRefTask.array[myRefTask.index];

        public void OnCompleted(Action continuation)
        {
            if (IsCompleted)
                continuation();
            else
                myRefTask.continuation = continuation;
        }
    }
}

But it won't work, because:

DemiMarie commented 8 years ago

One approach is to make async a feature of the runtime, such that the CLR knows about the runtime. Alternatively, unsafe pointers could be used.

Thaina commented 8 years ago

@bbary @svick If it not permit then it could if we find a way to implement it

What I think it possible is introduce class TaskRef<T> in additional to class Task<T>. Which TaskRef implement public ref T Results { get; } and constructor public TaskRef(FuncRef<T>)

And implement async/await to select Task<T> or TaskRef<T> based on return type of function we used on async

And I'm fine about ref local cannot cross await. But it should be allow ref local to use between await it get from

Code like this actually should work. Should remove CS8942 for this

async ref Product Produce(int i) { }
async Task Consume()
{
    ref Product X = await Produce(0);
    Console.WriteLine(X); // Still live in the block of await it got value from so no error

    ref Product Y = await Produce(1);
    Console.WriteLine(Y);
}

But like this should not work. Would be use CS8942 in this case

async Task Consume()
{
    ref Product X = await Produce(0);
    ref Product Y = await Produce(1);

    Console.WriteLine(X); // Throw CS8942 here
    Console.WriteLine(Y);
}

What I talking about is not that it work with current await or not. I'm saying that it should also work when we introduce ref return feature. And if it need more implementation then we should figure out how to do it. What I ask is the possibility technically, not the limit of current implementation

Still ref is difference from pointer, technically it should be safe. And must not be allowed across async block. But it also should be allowed in the same async block

bbarry commented 8 years ago

@Thaina I think async ref T Method() should be best left to a separate spec. It is different than both dotnet/roslyn#10902 (in that the return is ref T, not some tasklike) and this issue (in that such a spec would need to entail a state machine and some transport type TaskRef<T> or something like that) but at the same time it is clearly something that will be influenced by both issues.

As nice as production of an async ref method may be, it could be sidestepped for the immediate future of getting ref returns and ref locals out the door. This code:

async Task Consume()
{
    ref Product X = await Produce(0);
    ...
}

could be written without an additional allocation this way:

async Task Consume()
{
    var handle = await ProduceHandle(0);
    ref Product X = ref Resolve(handle);
    ...
}

Or in the slightly more frustrating way the current implementation works:

async Task Consume()
{
    var handle = await ProduceHandle(0);
    DoWork(handle);
}
void DoWork(int handle)
{
    ref Product X = ref Resolve(handle);
    ...
}

And I'm not saying this shouldn't be done. But I will say it will take considerable time to get done and should not be a cause for delaying C#7.

kfdf commented 8 years ago

Is it correct to understand that only references to local variables are not safe to return? And if so, why not just insert a runtime check in the caller that throws an exception if the returned reference points to something that lies in the discarded stack frame? That shouldn't be too much of a performance hit, and it can be optimized away if the compiler can prove that the returned reference is safe.

VSadov commented 8 years ago

@kldf At the CLR level - yes we cannot return references to locals and byval parameters. At the C# level the situation is a bit more complex because of lexical scoping. According to the language semantics a new set of local variables is created when control flow enters {}. As a result you should not be able to access the variable via a reference when outside of {}. From the language semantics the variable does not exist, form implementation prospective the IL slot of the variable could be reused for something else.

So, once we have a reference to a local variable, we would not only have problems with returning it. Using it as a source of byref assignments could also be problematic and cannot rely on runtime checks.

One simple solution to this is - make ref locals single-assignment and require ref assignment to happen at declaration only.

Also it is generally preferable to have compile-time errors over run time failures - you do not want to ship something and only later discover that some scenarios may lead to crashes.

kfdf commented 8 years ago

One simple solution to this is - make ref locals single-assignment and require ref assignment to happen at declaration only.

Doesn't feel like C#, and what about ref parameters? Maybe just forbidding assignments to ref variables from outer scope will be enough? And, most importantly, the compiler is easy to satisfy in this case.

Also it is generally preferable to have compile-time errors over run time failures

No arguments about that. But I don't see how can current limitations to ref locals and ref returns work in any reasonably complex scenario, with lots of structs passed around and assigned to locals, where just one not safe to return struct 'taints' the returned reference and ref locals are can only be used as aliases to the parameters.

kfdf commented 8 years ago

One more thought.. second best thing after compile time error is failing fast, and that's what check after return does, whereas currently the compiler can satisfied by promoting local variable to a field. It's a straightforward thing to do and is not too ugly, so is likely to become an accepted practice. But if there is a logical error and a reference to this field is returned, the programs just moves on.