Proposal: Ref Returns and Locals

stephentoub commented 9 years ago

(Note: this proposal was briefly discussed in dotnet/roslyn#98, the C# design notes for Jan 21, 2015. It has not been updated based on the discussion that's already occurred on that thread.)

Background

Since the first release of C#, the language has supported passing parameters by reference using the 'ref' keyword, This is built on top of direct support in the runtime for passing parameters by reference.

Problem

Interestingly, that support in the CLR is actually a more general mechanism for passing around safe references to heap memory and stack locations; that could be used to implement support for ref return values and ref locals, but C# historically has not provided any mechanism for doing this in safe code. Instead, developers that want to pass around structured blocks of memory are often forced to do so with pointers to pinned memory, which is both unsafe and often inefficient.

Solution: ref returns

The language should support the ability to declare ref locals and ref return values. We could, for example, now declare a function like the following, which not only accepts 'ref' parameters but which also has a ref return value:

public static ref TValue Choose<TValue>(
    Func<bool> condition, ref TValue left, ref TValue right)
{
    return condition() ? ref left : ref right;
}

With a method like that, one can now write code that passes two values by reference, with one of them being returned based on some condition:

Matrix3D left = …, right = …;
Choose(chooser, ref left, ref right).M20 = 1.0;

Based on the function that gets passed in here, a reference to either 'left' or 'right' will be returned, and the M20 field of it will be set. Since we’re trading in references, the value contained in either 'left' or 'right' is updated, rather than a temporary copy being updated, and rather than needing to pass around big structures, necessitating big copies.

If we don't want the returned reference to be writable, we could apply 'readonly' just as we were able to do earlier with ‘ref’ on parameters (extending the proposal mentioned in dotnet/roslyn#115 to also support return refs):

public static readonly ref TValue Choose<TValue>(
    Func<bool> condition, ref TValue left, ref TValue right)
{
    return condition() ? ref left : ref right;
}
…
Matrix3D left = …, right = …;
Choose(chooser, ref left, ref right) = new Matrix3D(...); // Error: returned reference is read-only

Note that when referencing the 'left' and 'right' ref arguments in the Choose method’s implementation, we used the 'ref' keyword. This would be required by the language, just as it’s required to use the ‘ref’ keyword when passing a value to a 'ref' parameter.

Solution: ref locals

Once you have the ability to receive 'ref' parameters and to return ‘ref’ return values, it’s very handy to be able to define 'ref' locals as well. A 'ref' local can be set to anything that’s safe to return as a 'ref' return, which includes references to variables on the heap, 'ref' parameters, 'ref' values returned from a call to another method where all 'ref' arguments to that method were safe to return, and other 'ref' locals.

public static ref int Max(ref int first, ref int second, ref int third)
{
    ref int max = first > second ? ref first : ref second;
    return max > third ? ref max : ref third;
}
…
int a = 1, b = 2, c = 3;
Max(ref a, ref b, ref c) = 4;
Debug.Assert(a == 1); // true
Debug.Assert(b == 2); // true
Debug.Assert(c == 4); // true

We could also use ‘readonly’ with ref on locals (again, see dotnet/roslyn#115), to ensure that the ref variables don’t change. This would work not only with ref parameters, but also with ref locals and ref returns:

public static readonly ref int Max(
    readonly ref int first, readonly ref int second, readonly ref int third)
{
    readonly ref int max = first > second ? ref first : ref second;
    return max > third ? ref max : ref third;
}

MgSam commented 9 years ago

If I recall, Eric Lippert blogged about this some years back and the response in the comments was largely negative.

I do not like this feature for C#. The resulting code is like an uglier version of C++, and code written with it takes longer to reason about and understand. The use-cases are not particularly compelling, and I have never run into a situation where I wished I had ref locals or return values.

axel-habermaier commented 9 years ago

Yes, I know very well that mutable structs should be avoided. Still, one interesting use case would be lists of mutable structs. Consider:

struct MutableStruct { public int X { get; set; } }
MutableStruct[] a = ...
List<MutableStruct> l = ..
a[3].X = 5; // changes the value of X of the struct in the array
l[3].X = 5; // compile time error

If the indexer of the List<T> class would return the value stored in the list by reference, the code above would compile, making the use of mutable structs less surprising. It is probably even more efficient as the (potentially large) struct no longer has to be copied out from the list.

Unfortunately, I doubt that the return type of List<T>'s indexer can be changed for backwards compatibility reasons.

xen2 commented 9 years ago

Disclaimer: I work on game engine, so I am probably not the typical user.

One use case this could really help us is this one:

MyHugeStruct[] data; // we use a struct to improve data locality and reduce GC pressure
// Ideally, we would like to be able to use List<T>, but we can't take ref then
for (int i = 0; i < data.Length; ++i)
{
   // Option 1: make a local copy (slow)
   var item = data[i];

   // Option2: To avoid making a stack copy of MyHugeStruct,
   // we have to defer to a inner loop function
   MyLoopBody(ref data[i]);

   // Option3: using new proposal, that would be much better:
   ref MyHugeStruct = data[i];
}

We end up making separate function for loop body, and in case of tight loop this can end up being quite bad:

Have to forward all parameters
Sometimes we found out with VTune that inner loop stack "initlocals" was taking up most (80%+) of the time if inner loop body happened to have a several locals (even if only 0 or 1 was used due to branching). This would not happened if the locals were contained and memzeroed once in the function containing the "for" loop.
not inlined in simple cases

Nice to have:

ref this[] operator(?) so that List<> and other collections can be used (vs being forced to use arrays)
a ++ operator on ref to be able to loop by incrementing pointer instead of indice multiplication (but probably unsafe).

Extra (probably impossible without changing BCL):

Lot of struct copy could also be avoided in EqualityComparer (Dictionary) if ref could be used when large structs are being used as key.

paulomorgado commented 9 years ago

What happens with this?

var data = GetData();
...
ref SomeStruct GetData()
{
    var ss1 = new SomeStruct();
    var ss2 = new SomeStruct();

    return ref Choose(ref ss1, ref ss2);
}
ref SomeStruct Choose(ref SomeStruct ss1, ref SomeStruct ss2)
{
    return whatever ? ref ss1 : ref ss2;
}

GetData might not be aware that Choose is returning one of its variables and returns to the caller a reference to it.

Does the value still exist after exiting GetData?

gafter commented 9 years ago

@paulomorgado You would not be allowed to return a ref to a local variable or parameter.

paulomorgado commented 9 years ago

@gafter, the only difference between my Choose method and @stephentoub's one is that mine does not have the selector passed as a delegate. Did I miss something here?

stephentoub commented 9 years ago

@paulomorgado, the compiler would only let you return a ref to something that it knew was either on the heap or that came from the caller. In my example, the ref inputs to the Choose method were all from ref parameters (or ref locals to ref parameters), so the compiler would conclude that the result of the Choose method met the criteria and would allow its returned ref to be returned. But in your example, the refs passed to Choose were not from the caller nor from the heap, such that the compiler couldn't be sure that the result of Choose was allowed to be returned, and it would error out.

paulomorgado commented 9 years ago

@stephentoub, forget my Choose method. Your's is the best that can be done and you just published it to NuGet and I added it to my project. How can the compiler know where the return valur of Choose is coming from? My GetData is just complying to the contract of Choose to get its result and pass along as all the code written so far and to be written in the future does.

What you're saying is that publicly exposed methods can't return refs, which reduce the usage to only private methods.

stephentoub commented 9 years ago

@paulomorgado, I understand the confusion, but that's not what I'm saying.

There would be some rules about what it would be safe to return, e.g.

refs to variables on the heap are safe to return
ref and out parameters are safe to return
a ref returned from another method is safe to return if all refs passed to that method were safe to return (by this same set of rules)

Forget the implementation of Choose here. Assuming Choose abides by these rules (which the compilation of Choose would enforce), in my example all of the inputs to Choose were valid to be returned, therefore the result of Choose could be returned. In your example, at least one of the inputs to Choose wasn't valid to be returned, therefore the result of Choose could not be returned. The compiler can validate that.

paulomorgado commented 9 years ago

@stephentoub, what I'm having trouble with is understanding how those rules can be effectively enforced.

And a proposal should have an example that works under the proposal.

stephentoub commented 9 years ago

@paulomorgado, how does my example not work under the proposal? And why do you believe the rules can't be enforced?

paulomorgado commented 9 years ago

@stephentoub, either that or I totally missed everything.

My understanding is that there's no way the caller can take the result of your Choose method as safe to return as reference. Is there? If so, how?

stephentoub commented 9 years ago

@paulomorgado, in this example:

public static ref TValue Choose<TValue>(
    Func<bool> condition, ref TValue left, ref TValue right)
{
    return condition() ? ref left : ref right;
}

left and right are both safe to return because they came from the caller.

In this example:

public static ref int Max(ref int first, ref int second, ref int third)
{
    ref int max = first > second ? ref first : ref second;
    return max > third ? ref max : ref third;
}

first, second, and third are all safe to return because they all came from the caller. max is safe to return because the only refs it's possibly assigned to are those which are safe to return.

If I as a caller wanted to use Choose, e.g.

public static ref TValue ChooseByTime<TValue>(
    ref TValue left, ref TValue right)
{
    return Choose(() => DateTime.UtcNow.Seconds % 2 == 0, ref left, ref right);
}

Both left and right are safe to return because they came from the caller. Therefore all of the ref inputs to Choose are safe to return. Therefore the resulting ref from Choose is also safe to return. I don't need to worry about the implementation of Choose, because the compiler is enforcing all of these same rules on the implementation of Choose.

paulomorgado commented 9 years ago

Both left and right are safe to return because they came from the caller. Therefore all of the ref inputs to Choose are safe to return. Therefore the resulting ref from Choose is also safe to return. I don't need to worry about the implementation of Choose, because the compiler is enforcing all of these same rules on the implementation of Choose.

But ChooseByTime isn't returning neither left nor right. It's returning the return value of Choose. Noting but the implementation details of Choose is saying its return value is the same as one of its parameters. What if Choose is an implementation of an interface?

You're restricting the use of Choose to cases where it works without any safeguards or proof that it's safe.

My example shows the opposite.

stephentoub commented 9 years ago

@paulomorgado, your example wouldn't compile... the compiler would error out exactly because it doesn't abide by the rules: your call to Choose is passed ref values that are not safe to return, therefore the result of your call to Choose is not safe to return. I'm sorry if I'm not explaining this well; not sure how to convey it differently.

stephentoub commented 9 years ago

Noting but the implementation details of Choose is saying its return value is the same as one of its parameters.

Ah, maybe this is the point of confusion. The implementation doesn't matter because the compiler assumes the worst: regardless of how a parameter is actually used, if any argument isn't safe to return, then the result of the call isn't safe to return. The compiler is conservative in that regard.

paulomorgado commented 9 years ago

A conservative compiler that assumes the worst cannot assume the return value of Choose is safe to return.

Is this what you're proposing?

public static ref TValue ChooseByTime<TValue>(
    ref TValue left, ref TValue right)
{
    TValue result = Choose(() => DateTime.UtcNow.Seconds % 2 == 0, ref left, ref right);
    if (result == left) reurn ref left;
    else if (result == right) return ref right;
    else throw new Exception("Invalid value.");
}

stephentoub commented 9 years ago

Why do you say that? What specifically about this example do you believe is problematic?

stephentoub commented 9 years ago

Let's try something else: can you construct an implementation of Choose that will compile based on the aforementioned rules/explanations but where the caller of the method could not assume its return value was safe to return?

paulomorgado commented 9 years ago

No I can't. Because I haven't been able to understand how this would work.

I can understand how, in your implementation of Choose, it is safe to return that reference.

What I can't understand is why its callers can safely return the same reference without intimately knowing its internals..

stephentoub commented 9 years ago

Because it wouldn't be allowed to return anything that's not safe in the case where the caller assumes it is safe. If the only thing the caller passes in are refs that are safe to return, then what could this method return?

one of those refs: that's safe.
a ref to an object it allocates on the heap: that's safe
a ref to some other local or parameter: that's not safe, but it's also not allowed, so it can't actually do this
a ref it got back from another call, but only if it passed in safe to return refs; if it passed in any non safe refs, then the returned ref would also not be safe to return, the compiler wouldn't allow it. Effectively the rules apply recursively here.

Etc.

paulomorgado commented 9 years ago

So, this wouldn't be safe, right?

public static ref TValue ChooseByTime<TValue>(
    ref TValue left)
{
    ref TValue right = default(TValue);
    return Choose(() => DateTime.UtcNow.Seconds % 2 == 0, ref left, ref right);
}

stephentoub commented 9 years ago

Correct, that would not compile.

copernicus365 commented 9 years ago

Beautiful solution, I've wondered why this couldn't be done before.

@MgSam [The resulting code is like an uglier version of C++] Because of sentiments like this (i.e. 'anything I don't personally use should never be part of the language for anybody else either, even though the CLR itself has this capability'), it means our language is needlessly crippled in places where a very easy and beautiful solution like this gives us such a capability. As the gamer showed in the comment above, this can be a big performance win in some cases.

whoisj commented 9 years ago

:+1:

Anytime I can pass a pointer instead of performing a value copy, I'm all for it. Are there good reasons to pass memory by value-copy? Yes. Should it always be the case? Absolutely not.

The resulting code is like an uglier version of C++

I agree, it is not pretty but it is very descriptive. It would be nice if the ref keyword could be replaced with syntax we're all used to. Perhaps we could use * in place of ref because int* foo; is "cleaner" and "easier" to read than ref int foo;. I put "cleaner" and "easier" in quotes because it is incredibly subjective.

Yes, I know that * is generally reserved for unsafe but there's no reason the symbol cannot be reused, so long as one is reserved for a "safe" contexts and the other for an "unsafe" context.

HaloFour commented 9 years ago

Given the limitations listed above imposed to maintain a safe context I'm having a hard time envisioning the use cases for this feature. The real gains would seem to be in how structs can be used throughout the BCL with arrays, lists or other collection types.

whoisj commented 9 years ago

Given the limitations listed above imposed to maintain a safe context I'm having a hard time envisioning the use cases for this feature. The real gains would seem to be in how structs can be used throughout the BCL with arrays, lists or other collection types.

Agreed. This is, in my opinion, a small step in the right direction though.

whoisj commented 9 years ago

Would this implementaion allow for ref int[] intRefs = new ref int[512];?

If it doesn't, then I am less excited than I originally was. If it does, read ref struct[] is difficult. Is it a reference to an array of structures or an array of structure references?

Better to use struct*[] in my opinion.

HaloFour commented 9 years ago

I don't disagree that ref something is unattractive, however your use of * is already legal C# syntax and implies an unsafe context. I'm sure that you know that, but I thought it warranted mention.

I imagine that the array scenario would likely depend on the proposal for fixed-size buffer enhancements, dotnet/roslyn#126. Once the size is determined and allocated I believe that would behave the same as a field or as a local.

whoisj commented 9 years ago

I don't disagree that ref something is unattractive, however your use of * is already legal C# syntax and implies an unsafe context. I'm sure that you know that, but I thought it warranted mention.

I do. I also know that * is only legal with an unsafe block. Thus, the compiler could assume that * needed to be "safe" unless in an unsafe block. Therefore operations like int* p = ...; p++; would no be legal, instead int* p would have to point to safely referenced memory.

Yes, there would be complexities if devs started an unsafe block, but there can rules established on this would work, etc.

VSadov commented 9 years ago

FYI: the PR for the initial commit of a prototype dotnet/roslyn#4042

Thaina commented 9 years ago

I support ref return

And can we have ref parameter in lambda?

HaloFour commented 9 years ago

@Thaina You can use ref parameters in lambdas today as long as the signature of the target delegate defines those parameters as ref:

public delegate void RefAction<T>(ref T arg);

RefAction<string> action = (ref value) => { value = "Hello World!"; };

string x = "";
action(ref x);
Console.WriteLine(x);

axel-habermaier commented 9 years ago

@HaloFour: What @Thaina probably means is that you can't capture a ref-parameter in a lambda.

Thaina commented 9 years ago

@HaloFour Sorry I don't know that. Which version we can use ref lambda?

I use unity for such long time so I don't update new info of C# much

HaloFour commented 9 years ago

@axel-habermaier Maybe. That wouldn't be my first guess given the proposal they posted under, but it is terribly unspecific. IIRC ref parameter capture would be wading too close to unsafe territory since you'd basically have to stuff the address to a variable in the state machine class and the compiler could no longer control its lifetime.

@Thaina C# has always supported ref and out parameters for anonymous delegates and lambdas.

Thaina commented 9 years ago

oh... I never know that we just can't (ref i) => {}. I just need to (ref int i) => {}

Thanks for your point

BreyerW commented 8 years ago

Sorry for necropost, but i have question. I found that ref properties will be supported but only for getter. Why couldnt it be resolved for setters too? I mean if we have

class Foo{

public ref int Number{
get;
set;
}

}

it could be resolved to public ref int get_Number(){...} and public void set_Number(ref int){...}

And if there is reason for abandoning setters why not do like this:

class Foo{

public string Description{
ref get;
set;
}

}

so we still be able to have setter and getter in one property (or this is already the case?)

VSadov commented 8 years ago

@BreyerW the main reason for disallowing setters in byref properties and indexers is that they would not be very useful. While you can make a ref for a field or an array element and return that from the getter, you cannot go the other way in the setter. If some use pattern is discovered, restriction on byref setters can be relaxed later, so it was decided to start with not allowing them.

BreyerW commented 8 years ago

Thanks for reply. I wonder - avoiding copy value types while passing to setter method isnt a good thing? And next thing - allowing non-ref setter alongside with ref getter is impossible? like i show in second example?

EDIT: Ah and if there isnt any dangerous situation with ref setter I dont see why we have to be so strict about this - if someone find pattern for ref setter then you dont have to cook special c# version in future, this already be enabled. And ref could be defined per acessor not per property. Obviously you are designers not I, so possibly there is something subtle i dont know ;).

VSadov commented 8 years ago

@BreyerW An important part here is that byref properties and indexers have assignable getters. If a type has a byref indexer, you already can read and write elements without redundant copying. What would be the "obvious purpose" of a setter if property is already assignable via its getter?

In particular byval setter next to a byref getter would actually make assignments ambiguous.

obj.Description = "aaa";

is this a assignment to getter or invocation of a setter?

There are short and long term costs of adding language features and it is next to impossible to remove them. That motivates the design team to resist features with unclear utility or confusing behavior.

BreyerW commented 8 years ago

oh i think i understand now why you abandoned setter, one of the reason is that ref getter can work like setter thanks to returning ref so there is no point in having setter? If that true then i completely see why you abandoned this. Thanks for clarification, now i feel a bit dump, obviously overlooked that.

the only thing that can be missing is fact by using ref getter there might be problem with firing events like OnBeforeValueChange but this is feature of ref itself, not flaw of c# design

BTW dont forget to update PropertyInfo somehow so CanWrite return true if there is only ref getter or add new check property for signalising there is ref getter. I mention this because i use this class.

alrz commented 8 years ago

Nothing mentioned about foreach. It would be nice to be able to write foreach(ref Struct item in arr).

Thaina commented 8 years ago

@alrz Your suggest would be impossible from foreach implementation. foreach use IEnumerable interface to return Current which is not return ref from there

It need to do opposite. We should have IEnumerableByRef to override Current. And let foreach check that if the collection is IEnumerableByRef then it will return item as ref automatically

Or maybe it should enable ref keyword in generic. So we will use IEnumerable<ref Struct>

It would be the best if MS will implement all things it has IEnumerable attached to (all things in System.Collection) to implement IEnumerableByRef when the feature was finished

alrz commented 8 years ago

@Thaina How about this?

When iterating over an array (known at compile-time) the compiler can use a loop counter and compare with the length of the array instead of using an IEnumerator

Thaina commented 8 years ago

@alrz Only array is possible with that kind of foreach. Which I think it should not be difference workflow. Instead, array should implement IEnumerableByRef if C# have one

alrz commented 8 years ago

It can be simply allowed only for arrays, and then translate to for( .. ) { ref T item = arr[i]; ... }. I don't think that something like <ref Struct> would be possible because it ultimately causes to outlive the local object which is not supported by CLR, AFAIK.

Thaina commented 8 years ago

@alrz I apologize that I am very against the idea of making array a special thing again. Actually I am against the idea to make something special case. We have this special problem from the start that only array has indexer return by ref and now we try to fix it, everything should have indexer return by ref as array could do

Yeah I think <ref Struct> is overkill too. Just IEnumerableByRef is enough

alrz commented 8 years ago

@Thaina I think nothing's wrong with special cases. foreach already has, though, unobservable, special case for arrays to make it faster, and ref locals also help to make things faster (avoid copying), so combining these two in an use case like this would be nice.

asvishnyakov commented 8 years ago

:+1:

dotnet / roslyn