dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.92k stars 4.02k forks source link

Compiler should elide null checks when it's provable they're unnecessary #65435

Open stephentoub opened 1 year ago

stephentoub commented 1 year ago

Version Used: 7a9b666

Steps to Reproduce:

public class C
{
    public void M0()
    {
        Dog dog = new Dog();
        dog?.Bark();
    }

    public void M1(Dog dog)
    {
        dog.Bark();
        dog?.Bark();
    }

    public Dog M2(Dog dog)
    {
        dog.GetHashCode();
        dog ??= new Dog();
        return dog;
    }
}

public class Dog
{
    public void Bark() { }
}

Expected Behavior: The ?. and ??= involve provably unnecessary null checks and should be elided in code gen.

Actual Behavior: All three contain a null check:

// M0:
        IL_0000: newobj instance void Dog::.ctor()
        IL_0005: dup
        IL_0006: brtrue.s IL_000a

        IL_0008: pop
        IL_0009: ret

        IL_000a: call instance void Dog::Bark()
        IL_000f: ret

// M1:
        IL_0000: ldarg.1
        IL_0001: callvirt instance void Dog::Bark()
        IL_0006: ldarg.1
        IL_0007: brfalse.s IL_000f

        IL_0009: ldarg.1
        IL_000a: call instance void Dog::Bark()

        IL_000f: ret

// M2:
        IL_0001: callvirt instance int32 [System.Runtime]System.Object::GetHashCode()
        IL_0006: pop
        IL_0007: ldarg.1
        IL_0008: brtrue.s IL_0011

        IL_000a: newobj instance void Dog::.ctor()
        IL_000f: starg.s dog

        IL_0011: ldarg.1
        IL_0012: ret

In many such situations, the coreclr JIT can detect the unnecessary check and elide it from the asm, but a) not all backend code generators / interpreters / etc. are able to, b) it takes additional effort at run-time to do so, and c) not eliding the checks from IL results in larger IL.

In all these cases, it would also be helpful if the C# compiler could emit a hidden diagnostic alerting the developer to the fact that the ? was unnecessary and can be removed to simplify the code.

(Separately, it might also be helpful to have a related diagnostic that could be enabled in cases where nullable reference type annotations and flow state suggest the ? is superfluous even if it's technically possible a null could be dereferenced, and the developer can use that separate diagnostic as a guide to investigate more deeply.)

cc: @jcouv, @jaredpar, per our conversation yesterday

sharwell commented 1 year ago

Note that eliding the unnecessary checks would only be possible when optimizations are enabled in the compiler. In other cases, it's possible that interactive debugging could reintroduce cases where the check is required again.

It would also be helpful to have a related diagnostic that could be enabled in cases where nullable reference type annotations and flow state suggest the ? is superfluous even if it's technically possible a null could be dereferenced, and the developer can use that separate diagnostic as a guide to investigate more deeply.

This is similar to #34714, but more difficult because the code fix is introducing a (possible) behavior change. Currently it's not possible for analyzers to distinguish between null ambiguous values and explicitly non-null values. We'd probably want a better answer to that before implementing the analyzer.

stephentoub commented 1 year ago

Currently it's not possible for analyzers

I'm suggesting the compiler itself would do the analysis and issue the diagnostic.

RikkiGibson commented 1 year ago

I think this would be a big change, especially the part to actually change codegen. To do this we would probably have to introduce an "absolutely not null under any conditions" state, in addition to the usual "not null" flow state used in nullable analysis :)

RikkiGibson commented 1 year ago

It could perhaps be implemented as its own analysis pass (which also rewrites the bound tree?), but that would also mean redoing a lot of logic for how nullability flows through. Not obvious to me if it would be better that way.

geeknoid commented 1 year ago

For this issue, I prefer to just have diagnostics to report when the compiler determines ? and ?? are not useful, given nullable annotations. And then let the developer fix the code.

We do this right now using analyzers, but the compiler would be in a better place to do this reliably.

stephentoub commented 1 year ago

given nullable annotations

i don't think nullable annotations should be trusted here for this purpose. I agree the compiler should be the one to issue diagnostics (hidden or otherwise), but they should be based on actual proof established by the compiler whether something is not null rather than a pinky swear something isn't. For example, if I have a method that is annotated to return non-null and i purposefully validate that, or if I have an argument that's annotated non-null and I purposefully validate that, the compiler shouldn't warn me off of that.

geeknoid commented 1 year ago

@stephentoub We want different things.

In an application, if you systematically use nullable annotations and avoid ! like the plague, you can generally trust in your code that when you have a reference that is not nullable, that it is in fact not null. As code evolves, a reference might have been nullable at some point, and now its now. Yet, in my code I will have leftover null checks, leftover ? and leftover ??. The nullable annotations tell me these paths will never be hit, yet I'm paying the cost for these leftover annotations.

So I'm looking for a consistent story at a semantic level. If I declared a reference as being not-nullable and yet I'm treating it as being nullable, then I'd like a tool to tell me about this inconsistency: "Hey buddy, maybe you should declare this thing as being nullable? Or maybe you should remove the ? and ?? and null checks? Make up your mind!".

CyrusNajmabadi commented 1 year ago

The nullable annotations tell me these paths will never be hit, yet I'm paying the cost for these leftover annotations.

How do you distinguish taht scenario between the "the annotations may be inaccurate, or hte compiler heuristic may have a hole in it (both are possible), and we need to be sure at runtime there will be no problem"?

Or maybe you should remove the ? and ?? and null checks? Make up your mind!".

It's a core part of ? that it is the developer telling the compiler: hey... i def need to be safe at runtime, even if your heuristic thinks it may be safe statically.

stephentoub commented 1 year ago

In an application, if you systematically use nullable annotations and avoid ! like the plague, you can generally trust in your code that when you have a reference that is not nullable, that it is in fact not null.

This is not the case.

var values = new string[42];
string s = values[0];

No warnings, no suppressions, but the non-nullable s is still null.

struct MyStruct
{
    public string Value;
}
...
MyStruct m = default;
string s = m.Value;

No warnings, no suppressions, but the non-nullable s is still null.

Nullable annotations were tacked onto the language two decades after its introduction. They are fallible.

If we want to have two separate diagnostics, effectively two different confidence levels, I'm ok with that. One is essentially "this is known to be unnecessary" and the other is "this might be unnecessary but I can't be sure". The former is what I think is most valuable. The latter could be a significant foot gun if developers blindly follow any recommendations it makes.

RikkiGibson commented 1 year ago

This issue is about eliminating null checks included in user code from the generated IL.

If you're interested in a feature which identifies and suggests removing unnecessary null checks in source code, please search separately for it and/or open a new linked issue.

stephentoub commented 1 year ago

This issue is about eliminating null checks included in user code from the generated IL.

It's about both a) improving the code gen to not emit unnecessary checks and b) raising diagnostics (hidden or otherwise) to indicate where null checks were unnecessarily requested. I covered both in the opening post :)

RikkiGibson commented 1 year ago

Thanks for the correction.

I'd prefer to track the suggestion to add "conservative, we are 100% sure eliminating this null check will not cause a problem" analysis, separately from the suggestion to add "nullable annotations indicate this null check is unnecessary" analysis, in any case.

sharwell commented 1 year ago

@stephentoub Can you confirm that the diagnostics described in this issue is specifically covering cases where the compiler can statically prove that the ?. was unnecessary? In other words, these diagnostics are completely unrelated to the Nullable Reference Types feature and instead focus on cases where a non-null value was directly produced and then consumed by code without possibility of interference.

stephentoub commented 1 year ago

Can you confirm that the diagnostics described in this issue is specifically covering cases where the compiler can statically prove that the ?. was unnecessary? In other words, these diagnostics are completely unrelated to the Nullable Reference Types feature and instead focus on cases where a non-null value was directly produced and then consumed by code without possibility of interference.

Multiple things have been discussed:

  1. Diagnostics (hidden or otherwise) when the compiler can statically prove that a null check is unnecessary due to it unequivocally seeing local proof that the variable could not possibly be null, e.g. you just invoked an instance method off of a local, a dominating branch already checked it for null and proved it wasn't, etc. These would never fire, for example, for fields, as another thread might be writing to that field and the compiler couldn't prove it wasn't actually null. These would never fire for locals that were address exposed. Etc.
  2. Optimize the IL to remove unnecessary null checks and dead code based off of the same conditions as (1), e.g. if you have if (a != null) a?.B(); emit IL for that as if it had been written if (a != null) a.B();.
  3. Separate diagnostics (hidden or otherwise) when nullable reference type annotations suggest something is non-null and it's being null checked but the compiler can't definitively prove that's actually the case.

I care about (1) and (2) and it's what I intended this issue to cover.

(3) might also be interesting, but it's potentially quite noisy and also dangerous if misused / misunderstood, and we'd need to tread lightly.