Open MaceWindu opened 2 years ago
@RikkiGibson Can you take a look at the compiler warning here? Looks incorrect to me.
Agreed. This looks like a flow problem. Either we go into the true-branch (in which case we should have learned that test
is not-null, or we go into the false-branch, in which case the end point is not reachable.
It looks like the dereference of 'test' on the LHS of =
occurs unconditionally here, so it makes sense to me to have a warning in the final fixed code.
@RikkiGibson How is it unconditionally? Isn't it unreachable if we throw?
I see, it's the stfld
after evaluation of the RHS that causes the NRE. It's interesting because a nested scenario where we wouldn't reach the end of the RHS will produce an error due to evaluation of the LHS.
It's not clear to me whether the compiler is behaving in accordance to the spec here.
The run-time processing of a simple assignment of the form
x
=y
consists of the following steps:
- If
x
is classified as a variable:
x
is evaluated to produce the variable.y
is evaluated and, if required, converted to the type ofx
through an implicit conversion.- ...
My read of this is that we are really supposed to get the destination field address, then evaluate the RHS, and then store-indirect to the field.
Aside from the spec concerns, it feels to me like the fixer should not produce code like this. Readers will anticipate left-to-right evaluation of the operands, alarm bells will go off in their head and they will have to spend time figuring out what's really going on--why is this code safe? is there a bug in my code? in the tooling? etc.
@RikkiGibson I have no idea how the ide fixer sounds even know there's an issue here. We'd effectively have to reimplement all the nrt flow analysis rules on our side to realize this would cause a warning.
I don't think the fix would involve knowing anything new about nullable state or flow analysis. I would simply do a dataflow analysis and only suggest the fix if either:
I don't know a lot about how the analyzer is implemented internally though.
the thing being assigned to is always written in the analyzed region, or
So I think we do that :-)
It's just that in one region it's always written to. In the other, it isn't touched (and the endpoint isn't reachable). It feels like he new code respects that as well.
Wrt lang semantics, is this field write classified as a variable?
the thing being assigned to is sufficiently trivial not to be confusing or problematic, such as a local variable or parameter read.
We tend to avoid that sort of rule as it ends up being overly restrictive. Imagine this code without it being the same value being tested that were assigning to. We'd certainly still want that to work :-)
My read of this is that we are really supposed to get the destination field address, then evaluate the RHS, and then store-indirect to the field.
That's not the behavior I'm seeing
Both ternary and if statement generate semantically equivalent code per my understanding.
In short, I think nullable analysis should work per how the compiler evaluates the expression (regardless of whether the compiler evaluation agrees with spec or not).
We're going to put this on the IDE design backlog to discuss what we think the right fix should be here. I agree with @RikkiGibson that it seems like this isn't a refactoring the user actually wants, but I'm also agreeing with @CyrusNajmabadi that I don't know what the right check is. My gut is wondering if it even makes sense to ever invoke this when there's a throw there, but that's probably overly restrictive for some folks as well.
We'd effectively have to reimplement all the nrt flow analysis rules on our side to realize this would cause a warning.
We might be able to do something where we walk the nullable flow states of the before/after and look for changes happening, but that might have a lot of false positives as well.
I agree with @RikkiGibson that it seems like this isn't a refactoring the user actually wants
Personally, i disagree with that :) The refactoring genuinely makes sense to me and seems clear/idiomatic. I can see why some would not want it. But it also seems totally reasonable for a class of users (including myself).
In design review, we decided the IDE behavior here is acceptable to be suggesting this change. We didn't want to add some fancier logic for detecting changes to nullable analysis because that might make the feature appear flaky or hard to predict. We could restrict this to only simpler cases of assignment on the left hand side, but our only motivation was really to work around this.
Fundamentally, the fact nullable analysis is flagging this when the dereference in question won't happen due to evaluation order is either a compiler bug or spec bug from our perspective, so we're moving this to the compiler.
Design Meeting Notes (2022-08-29):
Resolution:
I think it is prohibitively difficult for nullable analysis to do something more precise here. The reason is that nullable analysis is fundamentally not designed to distinguish state changes which result from null tests from state changes which result from assignments.
The lowering for this scenario roughly goes as follows:
// original
test.Field = test == null ? test.Field : throw new Exception();
// lowered
var tmp1 = test;
var tmp2 = test == null ? test.Field : throw new Exception();
tmp1.Field = tmp2;
This works fine when we are simply branching on a null test, where the original variable contains the same value as the temp after the test. But other times it doesn't just work: SharpLab
#nullable enable
using System;
using System.Diagnostics.CodeAnalysis;
var test = (Test?)null;
test.Field = TryGetValue(out test) ? test.Field : throw new Exception();
bool TryGetValue([NotNullWhen(true)] out Test? t) { t = new Test(); return true; }
class Test
{
public string? Field { get; set; }
}
where we lower in roughly the following way:
// original
test.Field = TryGetValue(out test) ? test.Field : throw new Exception();
// lowered
var tmp1 = test;
var tmp2 = TryGetValue(out test) ? test.Field : throw new Exception();
tmp1.Field = tmp2;
When we look at it this way, it starts to seem more like an aliasing problem.
It is certainly possible for us to make this case work, but I feel it would not provide a worthwhile return for the investment. I think it's OK if in some corner scenarios, applying an IDE fix results in a new warning appearing in the code.
@dotnet/roslyn-compiler in case anyone else on the compiler team wants to weigh in.
I would agree with Rikki's assessment here, in particular his read of the spec. I think, by the specification, you should indeed see a null-reference for this code, though I don't think changing the behavior at this point is a realistic expectation. If the IDE has determined that they will show a refactoring for this case, I think this bug should be closed.
@333fred Should the spec be updated then?
I'd ask @MadsTorgersen for his read on it. I would say this is likely an implementation bug, but one we are going to preserve for backcompat reasons.
I'd ask @MadsTorgersen for his read on it. I would say this is likely an implementation bug, but one we are going to preserve for backcompat reasons.
That seems uber suspect for me :)
For one, i'm not particularly sure why assignment isn't just a trivial next-step in teh flow analysis chain. We already do flow analysis across the binary operation pieces, so why not just treat assignment the same, albeit with different associativity.
#nullable enable
class C {
public bool B;
static void F(C? c)
{
if (c != null && (c.B = true))
{
}
}
}
Here we can see through flow-analysis that c.B = true
is totally safe since the flow state of the RHS is known. It's unclear to me why writing in the above form means we can now do the analysis, but we can't flow =
on the lhs as the final operation if the endpoint of the expression is reachable.
Here we can see through flow-analysis that c.B = true is totally safe since the flow state of the RHS is known. It's unclear to me why writing in the above form means we can now do the analysis, but we can't flow = on the lhs as the final operation if the endpoint of the expression is reachable.
Here, the spec is unambiguous that c.B
will never be executed if c != null
is false. For the original example, this is not true: my reading of the spec actually says that test.Field = ...
should dereference test
before evaluating the RHS. The C# compiler does not do this today, and changing that behavior is likely to be a massive breaking change, but that's how I read it. Certainly, adding another layer to this code will cause a NRE:
public static void TestMethod(Test? test)
{
// NRE at test.Nested
test.Nested.Field = test != null && test.Field == null ? string.Empty : throw new InvalidOperationException();
}
public class Test
{
public Test? Nested;
public string? Field;
}
The C# compiler does not do this today, and changing that behavior is likely to be a massive breaking change, but that's how I read it.
Ok. Let's run this as a spec change since the impl seems to be operating sanely and likely how everyone wants it to work :-)
Certainly, adding another layer to this code will cause a NRE:
Shouldn't there be a check that Nested is not null too?
Shouldn't there be a check that Nested is not null too?
The NRE is not on Nested, it's on test.
You could probably spec this by treating an assignment to a field similarly to an assignment to a property:
- If
x
is classified as a property or indexer access:
- The instance expression (if
x
is notstatic
) and the argument list (ifx
is an indexer access) associated withx
are evaluated, and the results are used in the subsequent set accessor invocation.y
is evaluated and, if required, converted to the type ofx
through an implicit conversion (§10.2).- The set accessor of
x
is invoked with the value computed fory
as its value argument.
However, since that still means you do the assignment by saving the field-access receiver to a temp, you would still have to introduce some form of alias analysis to associate the state of the temp with the state of the original variable when it is "valid" to do so. I think this would not be cheap to design or to implement.
@MaceWindu Could we clarify description of the problem? Right now it is very confusing:
@AlekseyTs sure. I just wanted to report incorrect behavior, didn't expected it to escalate to this level :smile:
@AlekseyTs Here is a SharpLab repro.
@333fred Should the spec be updated then?
I'd ask @MadsTorgersen for his read on it. I would say this is likely an implementation bug, but one we are going to preserve for backcompat reasons.
We've looked at this and related issues in the C# Standard work. In general we are (and should be!) open to changing the spec to codify actual behavior, but as the Nested
example above shows, it isn't clear that the compiler's behavior is consistent - it would be difficult and probably quite ugly to spec it accurately.
There's certainly a design argument to be made that we should hold off on evaluating the LHS until after the RHS - after all we might learn more about it (as in it ain't null), or have side effects that "save" it from being null (as in throwing or assigning). But the exact same argument could be made the other way around. Something needs to be evaluated first. The spec (and around 99% of the implementation 😉) consistently follows the principle of left-to-right evaluation. There doesn't seem to be a convincing language design argument for doing assignment differently.
https://ericlippert.com/2008/05/23/precedence-vs-associativity-vs-order/
So all in all my take is that
It's not a great state of affairs, but we don't have an available action to make it clearly better. We should note this, live with it and move on.
@MadsTorgersen
We should note this, live with it and move on.
This sounds good to me.
it isn't clear that the compiler's behavior is consistent
I am not sure I agree with that. I think the behavior is consistent. If the target of an assignment is a field reference, its receiver is evaluated before the right hand side is evaluated, but is not dereferenced until we execute the store command. It is the store command that actually dereferences the receiver, we are not doing that explicitly.
The current implementation is a bug
In my opinion, this is a small deviation from the spec, rather than a bug. It is really not obvious what does it mean to evaluate a field reference when it is a target of an assignment. I also think that it should be relatively easy to adjust the spec to clarify the behavior - receivers of the target are dereferenced in the process of the store operation. This affects properties and probably array elements too.
@MaceWindu
Could we clarify description of the problem? Right now it is very confusing:
sure. I just wanted to report incorrect behavior, didn't expected it to escalate to this level
Still do not see any clarifying changes made to the description of the issues at the top. Important part of reporting an incorrect behavior is to describe it so that anyone could clearly understand what you are trying to report and anyone could easily observe the incorrect behavior. If several steps are required to observe the problem, all the steps should be clearly described.
@AlekseyTs
@MadsTorgersen
it isn't clear that the compiler's behavior is consistent
I am not sure I agree with that. I think the behavior is consistent. If the target of an assignment is a field reference, its receiver is evaluated before the right hand side is evaluated, but is not dereferenced until we execute the store command. It is the store command that actually dereferences the receiver, we are not doing that explicitly.
I'd be happy to be wrong. It's certainly better if the behavior is consistent.
The current implementation is a bug
In my opinion, this is a small deviation from the spec, rather than a bug. It is really not obvious what does it mean to evaluate a field reference when it is a target of an assignment. I also think that it should be relatively easy to adjust the spec to clarify the behavior - receivers of the target are dereferenced in the process of the store operation. This affects properties and probably array elements too.
The spec is very clear about what it means to evaluate a field reference. Here is an excerpt from the Standard describing the evaluation of E.I
where T
is the type of E
:
- If
T
is a class_type andI
identifies an instance field of that class_type:
- If the value of
E
isnull
, then aSystem.NullReferenceException
is thrown.- Otherwise, if the field is readonly and the reference occurs outside an instance constructor of the class in which the field is declared, then the result is a value, namely the value of the field
I
in the object referenced byE
.- Otherwise, the result is a variable, namely the field
I
in the object referenced byE
.
The null reference exception is clearly spelled out here as part of the evaluation. It is independent of whether the field is subsequently the target of an assignment, and therein lies the problem with trying to spec the compiler behavior: The behavior of a field reference evaluation would become dependent on the context wherein it occurs. I am very loath to try to capture that faithfully in the spec.
We should note this, live with it and move on.
This sounds good to me.
I'm glad we agree on the course of action - then any disagreement we have is inconsequential! 😄
Given the above discussion I will close this issue as "not planned". If IDE wishes to change the behavior of editor features in this scenario, feel free to reopen and re-triage.
Thanks @MaceWindu for filing the issue.
Reactivating, but moving to IDE.
Version Used: 4.3.0-3.22401.3+41ae77386c335929113f61d6f51f2663d2780443
Steps to Reproduce:
Expected Behavior:
No IDE0045 suggestion
Actual Behavior: