dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.99k stars 4.03k forks source link

Warning for `is { }` on a value type and code fix `is var` when declaring a variable #40982

Open jnm2 opened 4 years ago

jnm2 commented 4 years ago

x is object is being advanced as the canonical null check for C# 8, with x is not null planned for the future. The reason is that the compiler warns when is object is a no-op (when used on a non-nullable value type). However, one of the main reasons to use x is { } is something that is object can't do: introduce a new variable. I might want to introduce a new variable when the x expression is more complex and I don't want to repeat it. Also, nullableFoo is { } foo allows me to use foo instead of nullableFoo.Value.

I started using is object instead of is { } while slightly skeptical that I'd ever notice the difference, but now I've run into real-world x is { } y code that shows that the compiler warning would have helped.

Currently, using x is { } y means giving up that useful warning (SharpLab):

class C
{
    void M(int value)
    {
        // âš  CS0183: The given expression is always of the provided ('object') type
        _ = value is object;

        // No diagnostic 😢
        _ = value is { };

        // No diagnostic 😢
        _ = value is { } foo;
    }
}

Parity with is object would be amazing:

class C
{
    void M(int value)
    {
        // âš  CS0472: The result of the expression is always 'true' since a value of type 'int'
        // is never equal to 'null' of type 'int?'
        //  ↓↓↓↓↓↓↓↓↓↓↓↓
        _ = value is { };

        // âš  CS0472: The result of the expression is always 'true' since a value of type 'int'
        // is never equal to 'null' of type 'int?'
        // 🛠 Use 'is var'
        //  ↓↓↓↓↓↓↓↓↓↓↓↓
        _ = value is { } foo;
    }
}
YairHalberstadt commented 4 years ago

Definitely would be useful. I use is {} as the canonical notnull check for brevity, clarity vs is object, and consistency between patterns that declare a subvariable and those that don't.

It would be great to have this warning.

sharwell commented 4 years ago

Keep in mind that compiler warnings will appear in both regular and generated code. I would be especially concerned about the value is { } x case, since it's purely a code style change.

I'm also concerned about encouraging the use of value is var x unless we also have the ability to detect cases where the static type of value is a reference type, which means either nullable reference types is enabled or we have some other analyzer in action.

jnm2 commented 4 years ago

@sharwell Can you elaborate on your concern about value is { } x being a code style change? It is syntactically as much of a null check as value is object would be, but the same is not true of value is var x.

I'm also concerned about encouraging the use of value is var x unless we also have the ability to detect cases where the static type of value is a reference type, which means either nullable reference types is enabled or we have some other analyzer in action.

Are you talking about in not encouraging is var in general? The proposed warning would only come into effect if the static type of value is a non-nullable value type.

jnm2 commented 4 years ago

@YairHalberstadt

and consistency between patterns that declare a subvariable and those that don't.

This is a good point. Losing that consistency is the main thing I dislike about is object, and I've seen others raise the same point.

gafter commented 4 years ago

x is {} is a canonical null check. So is x is object. On the false branch, the compiler knows that x may be null (if null is in its domain). That is true even if the compiler already believed that x could not be null to start with (e.g. a parameter declared string x).

x is {} y is not a canonical null check. Unless the compiler already believed x could be null, the compiler will not infer that x can be null in the false branch.

jnm2 commented 4 years ago

I forgot to say, I'd be happy with a CA* or IDE* warning too. The important part to me is that a warning is available by default in new projects for x is { } and x is { } y.

jnm2 commented 4 years ago

To strengthen the case for x is { } y:

@jnm2:

@333fred is that because expr is { } i will be pushed as the canonical null check when you want to introduce a non-null variable (either reference or value type)?

@333fred:

Yes.

https://github.com/dotnet/csharplang/issues/3369#issuecomment-619988890

If expr is { } y will be considered the de facto null check when a non-nullable variable is needed, it should come with the same safety measures as expr is object which have previously caused folks to recommend expr is object over expr is { }. (Namely, the warnings proposed at the top.)

jnm2 commented 4 years ago

Also, I've implemented these warnings and fixes at my day job and would be happy to contribute them.

alrz commented 4 years ago

The Ultimate C# Null-Checking Idiom would be e is not null and var x for C# 9.0

jnm2 commented 4 years ago

@alrz No, that is way too long.

alrz commented 4 years ago

Oh, sounds like I can't be using patterns and being sarcastic at the same time.

RikkiGibson commented 3 years ago

54653 suggests adding the diagnostic in a compiler warning wave.

@jnm2 depending on how the decisions shake out here it might be really great if you could contribute the analyzer/fixer for this scenario to roslyn-analyzers. I'll try and get a decision on it and follow up.

jnm2 commented 3 years ago

I'd like to contribute.

CyrusNajmabadi commented 3 years ago

@jnm2 I'm happy to review/help with the codefox. Any warning would have to go through the compiler team though. Lmk what you need!

jnm2 commented 3 years ago

Sounded like I have to wait for the decision @333fred just mentioned?

CyrusNajmabadi commented 3 years ago

So we could always do it purely as a suggestion/fix on the ide side. I can preemptively approve that. If you wanted a compiler warning though that would def need to go through a longer discussion/approval process.

CyrusNajmabadi commented 3 years ago

(no rush btw. Do this at whatever place you feel is best!)

jnm2 commented 3 years ago

I'd really like the warning. It doesn't matter to me if it's CSxxxx or IDExxxx. I assume it would have to be tied to analysis level 6.

RikkiGibson commented 3 years ago

Main thing I would be concerned about is if the operation APIs don't expose enough about dag nodes to make this as straightforward as doing it in the compiler. I do not know whether that is the case.

CyrusNajmabadi commented 3 years ago

@RikkiGibson I don't even know if we'd need IOp. Maybe i'm underthinking things, but it seems super simple to just check syntax and tehn a couple of type-checks. So i think it would be pretty trivial to implement this.

@jnm2 i'm ok with an IDEXXXX ID for this.

jnm2 commented 3 years ago

@RikkiGibson I already have the code (with a bunch of tests) sitting around. This is the entire analyzer impl. Looks like I could freshen it up a bit with C# 9 :D

        private static void AnalyzePropertyPatternClause(SyntaxNodeAnalysisContext context)
        {
            var propertyPatternClause = (PropertyPatternClauseSyntax)context.Node;

            if (propertyPatternClause is
                {
                    Subpatterns: { Count: 0 },
                    Parent: RecursivePatternSyntax recursivePattern
                }
                && context.SemanticModel.GetTypeInfo(recursivePattern, context.CancellationToken).Type is { IsValueType: true } type
                && !(type.OriginalDefinition is { SpecialType: SpecialType.System_Nullable_T }))
            {
                var span = recursivePattern switch
                {
                    { Parent: IsPatternExpressionSyntax isPattern, Designation: null }
                        => TextSpan.FromBounds(isPattern.IsKeyword.SpanStart, isPattern.Span.End),

                    { Parent: CasePatternSwitchLabelSyntax casePattern, Designation: null }
                        => TextSpan.FromBounds(casePattern.SpanStart, recursivePattern.Span.End),

                    _ => propertyPatternClause.Span,
                };

                context.ReportDiagnostic(Diagnostic.Create(
                    DiagnosticDescriptors.ApparentNullCheckIsNoOp,
                    Location.Create(context.Node.SyntaxTree, span)));
            }
        }
RikkiGibson commented 3 years ago

I don't even know if we'd need IOp. Maybe i'm underthinking things, but it seems super simple to just check syntax and tehn a couple of type-checks.

Basically I just see it as generalizing. You can either look for the patterns in syntax which always match when applied to value types, or you can grab the DAG and say that if it only has one reachable final state then we give a warning unless it is syntactically a 'var' pattern, for example.

CyrusNajmabadi commented 3 years ago

Gotcha! That help clear things up. I'm ok with either approach.

jnm2 commented 3 years ago

I'm pretty sure I don't want to generalize. That would be noisy in code similar to this which IIRC Mads specifically showed as preferable (I might be getting the details slightly wrong, but I remember it could have been a discard):

return num switch
{
    < 10 => "low",
    >= 10 and < 20 => "medium",
    >= 20 => "high",
};

Similarly matching (true, true), (true, false), (false, true), (false, false) looks more symmetrical and easy to take in than introducing discards.

The main value in checking { } is to bring it on par with not null in terms of safety which was previously a reason to eschew it in favor of object. And now that not null exists, it's really only { } unwrappedValue that is motivating this.

RikkiGibson commented 3 years ago

The presence of "redundant" sub patterns doesn't affect whether there's only one final state. For example, for the sample you just gave:

return num switch
{
    < 10 => "low",
    >= 10 and < 20 => "medium",
    >= 20 => "high",
};

Here is a visualization of the DAG that gets produced for it:

[0]: t0 < 10 ? [1] : [2]
[1]: leaf <arm> `< 10 => "low"`
[2]: t0 < 20 ? [3] : [4]
[3]: leaf <arm> `>= 10 and < 20 => "medium"`
[4]: leaf <arm> `>= 20 => "high"`

Each 'leaf' is a final state. We can see there are multiple reachable final states, so we wouldn't be introducing a new warning for this.

Let's also look at the dag for the pair-of-bools scenario:

return (b1, b2)
{
    (true, true) => 1,
    (true, false) => 2,
    (false, true) => 3,
    (false, false) => 4
}
[0]: t1 = t0.b1; [1]
[1]: t1 == True ? [2] : [6]
[2]: t2 = t0.b2; [3]
[3]: t2 == True ? [4] : [5]
[4]: leaf <arm> `(true, true) => 1`
[5]: leaf <arm> `(true, false) => 2`
[6]: t2 = t0.b2; [7]
[7]: t2 == True ? [8] : [9]
[8]: leaf <arm> `(false, true) => 3`
[9]: leaf <arm> `(false, false) => 4`

Basically, the dags we want to warn about are those like the following:

int x = ...;
if (x is { } y) { }
[0]: when <true> ? [1] : <unreachable>
[1]: leaf <isPatternSuccess> `{ } y`

(BTW, these visualizations are grabbed directly by adding tests to the compiler, then calling BoundDecisionDag.Dump() from the debugger.)

Also keep in mind that these dags fairly precisely represent the semantics of the code we'll generate. So if a dag contains a null test, for example, that null test is bound to be emitted. Any tests which have been found to be redundant or can be "optimized" out by the dag machinery have already been optimized out by this point in binding.

Now, I could be wrong about the solution in that there are actually "vacuous" patterns besides 'var' where we do not want to warn. But it does seem like it could be a good thing to discover what those patterns are by running a check like this over Roslyn's test suite and implementation.

The big hangup is that I think this kind of check can only really be done in the compiler. I just glanced over OperationInterfaces.xml and it looks like you can only get the patterns via IOperation, not the DAG. I think that means LDM would need to buy in to the new warning. LDM is essentially adjourned till late August, though.

Having spilled all that ink, I still think it would be fine to include this in the SDK analyzers or in the IDE analyzers instead, more or less the way you've implemented it.

to be honest, I was also just excited to show off here the new DAG visualizations that I spent a couple days working on earlier this summer.

333fred commented 3 years ago

I think that means LDM would need to buy in to the new warning.

To be frank, LDM needs to buy in anyway. We specifically decided that { } name is not a pure null test, and any change to that needs to involve a language change.

RikkiGibson commented 3 years ago

It wasn't clear to me that introducing a warning here also means changing the "not a pure null test" status of { } name. If LDM lands on the position that the compiler shouldn't warn on it then it feels like it can still be addressed in other places for the users who care about it. For example, if we don't want this to be a compiler or SDK analyzers warning, it might make sense for it to be a configurable code style warning instead.

I would be disappointed if it didn't end up somewhere in the compiler or official analyzers as I've fallen in this pit before--thinking I'm null testing the LHS of an is and that it's all good, but I didn't realize my input was a non-nullable value type and I actually needed to check some HasValue or IsDefault or whatever property.

(FWIW, we have been kicking around the idea of a mechanism to allow value types to express whether the value is missing stuff that is required for "normal" usage of the value, for types like ImmutableArray<T> or TypeWithAnnotations.)

333fred commented 3 years ago

It wasn't clear to me that introducing a warning here also means changing the "not a pure null test" status of { } name.

The warning (or lack thereof) results from is object being considered a pure null test, while { } name is not considered a pure null test. We intentionally did this because we believe a pure null test exists for one purpose and one purpose only: test for null. { } name has multiple purposes, namely giving a name to a location. For example, making { } name a pure null test would mean this code now has a problem:

#nullable enable

void M(MyClass c)
{
    switch {c}
    {
        case { MyProperty: 1 }: ...; break;
        case { MyProperty: >1 }: ...; break;
        case { } name: ... break;
    }
    _ = c.ToString(); // Currently: no warning. After making { } name a pure null test, warning
}

We looked at the above code sample and similar ones, and decided that we didn't think { } name implied null was possible here, and that therefore { } name should not be a pure null test.

jnm2 commented 3 years ago

I don't understand the example. The warning is there on the line that says "Currently, no warning." sharplab

Not yet grasping what could be wrong with the concept of a pure null test that gives a name to a location. Sounds like exactly what I've always wanted.

333fred commented 3 years ago

Sorry, was on my phone. I forgot the name for the last case, it's been updated. https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEA3GUCWAZgJ4A+AxAHYCuANrQIbC0wAEMlTLAsAFB8ABFKwCyACgEBGAAyswASj4BvPqzWsAzgHd8GMAAtWYhavUre6y3IYa2S1gBkOAcwz6QrSawC+H4LAYAawBuUytrW1Z7J0pXd1YAPi9fVn8YINCLcIi7H1ZOAFsYPwCQsJ9ygH1WAF45ADoAFQgAZQwCWLF5YNYAel7WAGFqKFhKDFpiD0oIVi0GKEp8WPrWAEFCDDxWAqDl5yi8wrYGVgAHEbYaelYtjQw0OYWl2L5vIA=.

jnm2 commented 3 years ago

Okay, I see the difference now. Why was it desirable for there to be no warning with { } name? I don't understand, especially because there is a warning with { }. var name is what you use if you are not doing a null check.

333fred commented 3 years ago

var name is what you use if you are not doing a null check

LDM did not agree with this statement 🙂

jnm2 commented 3 years ago

Can you share why? After all, that statement is the entire premise of the analyzer I'm asking to add:

333fred commented 3 years ago

Because var name implies that null could be possible input, while { } name does not imply this. However, the pattern also does not exist for the sole purpose of checking for null: it also gives a name to a location. It would be unfortunate if, in giving a clearer indication of the nullability of a location to a reader, you also informed the compiler that null was possible on other paths. Therefore, our principle in the LDM that the only things that introduce null as a possible value for a location are things that only check for null, and serve no other purpose.