dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.13k stars 4.04k forks source link

while variable closure #15529

Closed Andrew-Hanlon closed 7 years ago

Andrew-Hanlon commented 8 years ago

In C# 5, foreach was explicitly changed - a breaking change no less - to ensure that loop variables were not closed over. At the time Eric Lippert wrote:

this is the single most common incorrect bug report we get... That’s a terrible situation for everyone; we very much wish to design a language which does not have “gotcha” features like this. [1]

With C# 7's current rules for while loop scoping, we again have the same problem where condition and loop variables will be closed over by default,

Wouldn't it make sense to consider this ahead of time and ensure that out and pattern match vars do not fall into this trap?

while(GetNext(out T value)){...} should follow in the footsteps of foreach and produce something like:

{
    T tmp;
    while(GetNext(out tmp))
    {
        T value = tmp;
        ...
    }
}

In general I do not agree with the new scoping rules for if and while, but I see the later being not just confusing or an inconvenience, but a flaw that will be a "gotcha" for many.

@ericlippert 's final quote in the link above is also quite poignant:

Design is, of course, the art of compromise in the face of many competing principles. “Eliminate gotchas” in this case directly opposes other principles like “no breaking changes”, and “be consistent with other language features”.

The C# 7 language team has a real opportunity to meet all of these principles before they become an issue.

Please reconsider the scoping rules for out and pattern match variables.

DavidArno commented 8 years ago

In my view, you are mixing three things into one here:

  1. out var's in for, foreach, while and do while loops should follow the same non-closure rules as loop vars currently. This seems a no-brainer.
  2. Giving out var's in while loops the same leakage rules as if's, as opposed to other loops, seems more illogical than the rest of the scope leakage rules. I wonder if there still remains a glimmer of hope that the language team might back down on this one?
  3. "Please reconsider the scoping rules for out and pattern match variables." The team have made it very clear that they won't do this, no matter how many developers ask them to.
Andrew-Hanlon commented 8 years ago

@DavidArno Good points.

I hadn't actually considered the foreach case, but it is indeed very inconsistent to have different scoping and closure rules within the same construct. At least with foreach it is fairly hard to create an enumerable function with ref-type out vars that change during enumeration, but it is certainly possible to run into this using a custom type or global state.

In the while case, however, it will be extremely easy to run into this "gotcha" since any condition with a variable would be closed-over and the syntax appears nearly identical to the standard foreach.

The for loop has always had this issue so at least it is self-consistent.

And I am of the mind that variables should not be able to leave the condition at all in do-while since they can't be used inside the loop scope anyway.


If we made a pros and cons table for this scoping, I think there would be a heavy weighting to one side!

DavidArno commented 8 years ago

@Andrew-Hanlon,

Good point re for. I was getting a bit carried away with my desire to unify everything 😁

gafter commented 8 years ago

/cc @dotnet/ldm

CyrusNajmabadi commented 8 years ago

Definitely something we could consider. I wouldn't have a problem with all the 'loop' constructs having a similar concept of a scope where their declared-variables would go.

@gafter @MadsTorgersen I think it would be fine to change 'while' to behave in this manner. It would keep the loops consistent, while not impacting any of the scoping scenarios we cared about for the if/switch patterns.

Note: i do think this code pattern would be rare. With foreach's and lambdas you were mixing two massively common constructs. To run into the problem with 'while', you'd need to have a 'while' with an 'out-param', and you'd need to be using a lambda which captured that out-param.

That case seems so niche that i'm also ok not doing anything special here. Certainly, if we did nothing now, i would likely feel this would never warrant any work in the future. However, since we are touching scopes now, perhaps it's reasonable to just do this work. With that, i think our scope rules would be:

  1. Expression variables are scoped by blocks, for, foreach, while, do-while and using statements, as well as all embedded statements:

--

This seems reasonable to me. The core cases of if statements and expression-statements are unaffected, while the other cases seem to have a somewhat reasonably consistent set of semantics (depending on how your brain categorizes C# constructs). As it would be very intuitive to group 'while/do-while' with 'for/foreach', i think this approach would be fine.

To me, the important things with 'if' and 'expression-statements' was that we could see the valuable use cases in both narrow and wide scoping. I'm having trouble thinking of any valuable use cases for 'wide' 'while-loop' scoping. I'll do an examination of Roslyn itself to see if we've ever used this pattern.

Note for Mads/Neal, if you didn't see https://github.com/dotnet/roslyn/issues/14697#issuecomment-261845480

Roughly 60% of out-var usage in Roslyn uses 'wide semantics'. i.e. we take advantage of spilling the variable into the outer scope. My gut tells me the number will be close to 0% for whiles. As such, if there are really no great use cases for splling while-variables, we should likely unify with the other loops.

Thoughts?

CyrusNajmabadi commented 8 years ago

@Andrew-Hanlon Thanks for the feedback!

DavidArno commented 8 years ago

@CyrusNajmabadi,

Definitely second your thanks to @Andrew-Hanlon. After the storm of recent days over this cope change, I think Andrew may have turned up with a simple suggestion during the calm that could be a genuine compromise. If if is a special case, all by itself, of having wider scope, my position changes from "I think this irresponsible" to "I don't like that, but I can see why others do", at which point it becomes an observation, rather than a passion and all would become well again.

(though I reserve the right to remain miffed about is var scope 😉)

Andrew-Hanlon commented 8 years ago

Thanks for looking this over @CyrusNajmabadi and @gafter and all.

While I agree that this scenario will be less common than the C# 5's foreach case (if only due to less usage of while loops in general), I think it is hard to judge how common a pattern will be until it is available in the language.

I can easily see this being common enough and problematic:

static void Main()
{
    var actions = new List<Func<string>>();

    var animals = new Queue<Animal>(
          new Animal[]{ new Cat(Name: "1"), new Cat("2"), new Cat("3"), new Dog() });

    var current = animals.Dequeue(); 

    while(animals.Any() && current is Cat cat)
    {
        actions.Add(() => cat.Name);                
        current = animals.Dequeue();
    }

    actions.ForEach(a => Console.Write(a() + " ")); // Prints "3 3 3 "
    Console.ReadKey();
}

P.S. In a related pain point about this variable closure, if I had switched the order of the animals.Any() condition, the loop would still terminate as intended, but now the closure would lead to a very subtle null reference exception...

CyrusNajmabadi commented 8 years ago

Data time!

As mentioned already, Roslyn has a 60/40 split between usages of statements that spill out-vars widely versus those that spill them narrowly. i.e. we use the:

if (!TryWhatever(..., out var v))
{
   ...
   v = 
   ...
}

// use v

pattern a lot.

So now to while loops. In roslyn only 2% of all while loops even use an out or ref variable. That's higher that i expected!

However, examinign all the usages shows that they're nearly all of the form:

            Task curTask;
            while (tasks.TryPop(out curTask))
            {
                curTask.GetAwaiter().GetResult();
            }

I could only find one piece of code (and it was complex at that) where out was being used to write into an outer scope. And in that case it was writing into another variable produced by another out-var. A very uncommon code pattern indeed.

From my sample size of one code base, i'm of the opinion that:

  1. Using out-var in a while loop will be pretty rare.
  2. However, if it were to be used, it would very much prefer narrow-scoped semantics.

--

With the numbers collected, i think we're doing the right thing with if/switch/expression-statements. However, i think we have a better choice when it comes to loops.

As we've discovered, and Andrew rightly reminds us, loop scoping is particularly problematic when it comes to capturing. Best to avoid the whole problem as there seems to be little (no?) data indicating that loops should spill.

CyrusNajmabadi commented 8 years ago

Note: i find andrew's pattern examples more compelling. And in those we'd also likely want to have narrow scopes. So i think we should make these scopes narrow since we're doing the work in this area right now anyways.

alrz commented 8 years ago

Can you please run the same analysis for the negated is operator? I think that is also a rare case and the problem would be non-existent when we have let statements. Since is produces an unassigned variable in the else part I think it wouldn't be wise to spread the scope to the whole block. Thanks!

CyrusNajmabadi commented 8 years ago

I don't have the numbers handy anymore (should have really saved the excel spreadsheet :( ).

But my findings were that it was common enough. While you are correct that some codepath would be unassigned, we have code that then wants to assign in the errant branch. i.e. it's common enough for us to do:

if (x is string s) {

}
else {
    s = something_else
}

// use s

I think it really comes down to a few different camps/styles. I have a functional background. So i'm more used to immutable values and narrow scopes. But C# devs come in all stripes, and as we've seen from our own code, there are many how use a more mutable/imperative style.

The former group is better served by narrow scoping. The latter group by wider scoping.

However, to repeat myself (from teh other thread), our choice was between:

  1. Optimize for one, and leave the other out entirely.
  2. Optimize for the other, and make things less pleasant for the other.

At the end of the day we chose '2'. Wide scoping means you can use the feature if you're in either camp. It does make things slightly less pleasant though if you want to use the same name for different types in different scopes. We felt like we could live with that restriction though. It was better than having people who wanted wide-semantics for their current constructs not be able to use them.

HaloFour commented 8 years ago

Wow. I was all about narrow scoping for while loops just for the sake of consistency between other looping constructs. The issue with closure capture is a much more compelling reason.

I still think that if is a shame, but if match is somewhere on the horizon I can live with it. Begrudgingly. :grin:

CyrusNajmabadi commented 8 years ago

+1 on having 'match'. I recently converted expression based code over to using the new pattern matching switch, and i realllllllly want match :)

Andrew-Hanlon commented 8 years ago

@CyrusNajmabadi I like your style, greping real code is always important! But we also need to think about pattern matching vars - which I believe will become very prevalent.

Another thing to consider is that the previous example could as easily apply to Tasks, where the closure would (at best) produce unpredictable results:

static async Task Animals(Queue<Animal> animals)
{
    var tasks = new List<Task>();
    var current = animals.Dequeue();

    while (animals.Any() && current is Cat cat)
    {
        tasks.Add(Task.Run(() => Console.Write(cat.Name + " ")));
        current = animals.Dequeue();
    }

    await Task.WhenAll(tasks); //Could print "1 2 3 ", "2 3 3 ", or most likely "3 3 3 "
}
CyrusNajmabadi commented 8 years ago

To be clear, i'm 100% agreeing with you. I was collecting the data to make sure there weren't any sort of common patterns we would be potentially hurting.

That turned out to be the case with 'if' and expression-statements, which is why we changed the scoping semantics. But, fortunately, (IMO) this looks like a very reasonable change we can make, with practically zero downsides.

alrz commented 8 years ago

@CyrusNajmabadi That's totally reasonable. But I don't think that would be applicable to the ternary operator.

var x = obj is T t ? (use t) : whatever;
// why would I want t here?
GeirGrusom commented 8 years ago

Meh. The ternary operator is a single small expression. It wouldn't be astonishing if it didn't declare a scope unlike the if statement, which already declares a scope, just not where you would expect.

CyrusNajmabadi commented 8 years ago

So, in the interest of not starting another OMGHUEG thread, i ask that we take the discussion of ternary operators back to https://github.com/dotnet/roslyn/issues/14697 . If there is a concrete proposal, then let's open a new issue on that.

I'm not trying to squash your ideas AT ALL. I just think we have a great proposal here that is very nice self contained. I would prefer to not have the discussion around it balloon into more than just while loops. It will help our process around this a LOT.

Thanks much!

DavidArno commented 8 years ago

@CyrusNajmabadi,

Sounds sensible, though #14697 is a bit of a mess and isn't a good place either. @alrz, would you mind creating a new issue re scoping and ternary operators?

alrz commented 8 years ago

Thanks to @HaloFour, we have it here now: #15538.

Back to the subject, what about do while,

do M(out var x);
while(GetNext(out var tmp) && tmp.IsFoo);
// x definitely not in scope,
// would be tmp in scope?
HaloFour commented 8 years ago

The do/while loop is an oddball here. Any variables introduced couldn't be read anywhere else, neither within the loop block (use before declaration) nor afterwards (not definitely assigned). I'd concur that if out declarations or pattern variables are permitted here that their scope not leak beyond the condition expression. Or pull a C++ and not allow variables to be introduced there at all.

svick commented 8 years ago

@HaloFour

The do/while loop is an oddball here. Any variables introduced couldn't be read anywhere else, neither within the loop block (use before declaration) nor afterwards (not definitely assigned).

I'm not advocating one way or the other about what the scope should be, but variables assigned (and in C# 7.0, introduced) in the condition of a do-while loop can be definitely assigned after the loop. Consider this C# 6.0 code:

int i;
do
{
} while((i = 42) == 0);

Console.WriteLine(i);

This is perfectly valid code and i is definitely assigned after the loop, so it's okay to print its value. So if the scope of a variable declared in the condition of a do-while loop was wide, it could be usable after the loop.

Also note that using a break; inside the do-while loop makes the variable not definitely assigned.

HaloFour commented 8 years ago

@svick

Ah, yeah, you're right. Still would be odd to try to use the variable within the block both due to definite assignment and the fact that the declaration follows the block.

bbarry commented 8 years ago

I think I like the idea of tight scoping on while loops. A while loop today:

while (condition)
{
    ...
}

compiles the same as if it was written (and occasionally it is useful to refactor to this style):

while (true)
{
    if (!(condition)) break;
    ...
}

so scoping this tightly reinforces the idea that this refactoring is precisely true. This:

while (animals.Any() && current is Cat cat)
{
    actions.Add(() => cat.Name);                
    current = animals.Dequeue();
}

should be the same as:

while (true)
{
    if (!animals.Any()) break;
    if (!(current is Cat cat)) break;
    actions.Add(() => cat.Name);                
    current = animals.Dequeue();
}

(btw, I like the wide scoped if decision and will not say anything more about it; here it makes the refactoring I already know and make use of just plain work)

However do-while loops are odd. If said narrow scoping rule were introduced for them, the introduced variables would not be usable outside of the loop condition. I haven't entirely made up my mind on whether the narrow scoping is better or not for those. Also:

string input;
do {
    WriteLine("Number: ");
    input = ReadLine();
} while(!int.TryParse(input, out var number));

Any variable declared in the block of a do-while loop is not usable in the condition. In order for input to be used here, it must be declared outside the block, because the condition is after the block. Because of this, I think one of these:

CyrusNajmabadi commented 8 years ago

Mads will have notes at some point. But one of the conclusions of today's LDM was that we would be revising the scoping for while loops. Specifically:

  1. variables defined in the condition of the while loop would only be scoped to the body of the loop. So if you had a pattern or out-var in a while-condition, you would only be able access it with the span of the while-loop itself.
  2. Variables defined in the condition of the while loop would be 'short lived'. Or, more understandably, if they are captured in a lambda in the loop, each iteration will produce a new value, and each lambda will only see the value they captured. (i.e. 'foreach+lambda' semantics).

There are many more things we intend to discuss. But we had to cut things a little short because our offices are being painted, so they needed us out of the building :)

Thanks for your continued help and support, and we hope you like this news!

iam3yal commented 8 years ago

@CyrusNajmabadi so basically variables scope is widened for all cases except while loops? what about expressions that lives outside to control blocks? as mentioned here #15538? got anything to share on that?

HaloFour commented 8 years ago

@eyalsk

Actually this should bring while back in line with for and foreach variable scope, and inline with foreach for variable isolation (can enclose in lambda without the value changing on you). That leaves if as the outlier, which I think I can live with.

I'm curious about what remains. I seriously hope the LDM pays a little lip service as to how out declarations and pattern variables may work within LINQ queries. Trying to address that after release would require syntax changes.

DavidArno commented 8 years ago

I'm curious about what remains

Surely what remains is everything covered by #15538 (which you created! 😉), namely the scope for all out/is expressions that aren't in an if, switch, using or loop.

gafter commented 8 years ago

The LDM met yesterday and decided to approve this proposed change to the scoping of expression variables in a while loop. Specifically, a while loop written

while (<cond>) <body>

will have expression-variables declared in expr scoped as if it were rewritten

continueLabel:;
{
    if (!<cond>) goto breakLabel;
    {
        <body>
    }
    goto continueLabel;
}
breakLabel:;

We will also make a corresponding change to the for loop. A for loop written

for (<decl>; <cond>; <incr>) <body>

will have expression-variables declared in the three parts scoped as if it were rewritten

{
    <decl>
    while(<cond>)
    {
        <body>
    continueLabel:;
        { <incr> }
    }
}

Note that <incr> has its own little scope.

bbarry commented 8 years ago

I vote for discussion remaining here and #15630 stays a purely technical issue for managing the actual change.

Andrew-Hanlon commented 8 years ago

Thank you kindly @CyrusNajmabadi and @gafter for the update and information regarding the LDM. I am really elated with the response, pace, and direction on this issue - and I believe it will go a long way to making the powerful new features in C# 7 feel congruent and integrated within the language. Thanks again.

alrz commented 8 years ago

@bbarry There is nothing left to discuss. yesterday never comes.

qrli commented 8 years ago

So what's the LDM decision for do ... while()?

MattGertz commented 7 years ago

Keeping this alive in case there's further conversation, but given the merges, I'm getting this off the "panic" radar.

MadsTorgersen commented 7 years ago

Design notes for this are in #16694.

Thanks again for the suggestion!

gafter commented 7 years ago

while was changed as suggested.