dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.12k stars 4.04k forks source link

Suggestion: warn on use of "not A or B" pattern matching syntax #75506

Open GrabYourPitchforks opened 1 month ago

GrabYourPitchforks commented 1 month ago

Problem summary

I have observed significant ecosystem use of the is not <a> or <b> pattern matching syntax recently. However, it appears that much of the usage is incorrect due to developers misunderstanding the order of operations during evaluation.

In colloquial usage, the phrase "if x is not y or z" is understood by most participants to mean "if x is neither y nor z." (That is: ¬(y ∨ z), or (¬y) ∧ (¬z).) Instead, in C#, the expression is understood by the compiler to mean (¬y) ∨ z.

In fairness, this mirrors the more traditional Boolean syntax if (!y || z), where the ! operator has a higher precedence than the || operator. However, there's evidence that developers largely do not expect this same precedence when using the new pattern matching syntax, perhaps because it throws their brains into colloquial English mode rather than software engineering mode.

Evidence

I searched for the not <x> or <y> pattern across all first-party code bases using CodeQL and found 96 files which use the pattern, then I manually inspected each file to see whether the author intended the colloquial "neither / nor" meaning or the engineering "not / or" meaning. Of the 96 results, 85 of them indicate incorrect usage. That's an 89% error rate for developers using this syntax.

I can't share exact internal data, but you can see this using normal GitHub scans as well.

The top offenders seem to be these patterns:

There are variations, but they're a long tail compared to the above.

Proposal

I propose that the C# compiler produce a warning for any occurrence of not a or b unless the first expression is parenthesized like (not a) or b. This should make it clear to developers what the interpreted order of operations will be.

I don't propose warning on !a || b unless there is evidence this pattern trips people up. I have never seen such evidence myself, nor am I actively searching for such evidence.

Further exploration

At the request of the C# team in a group chat, I also searched for occurrences of the is not <x> and <y> pattern. That is - we wondered if developers also frequently misinterpreted "not / and" as "neither / nor" - same as they were doing with "not / or". An internal scan showed only one or two questionable hits. The team may want to consider making this a warning as well for symmetry, but I am unable to provide evidence that "not / and" is causing developers to run into the same kind of issue.

As a general consideration, if the developer writes x or y (or x || y), and if the compiler can prove that y ⇒ x, then the compiler may want to consider emitting a warning, because it means the y component of those expressions will never evaluate to true.

For example, given is not null or "", an empty string implicitly satisfies not null. This basically means that you'll always end up with true || true or false || false. If we charitably assume that developers aren't frequently in the business of writing throwaway code, perhaps this is an indication that they expected something else to happen? (Same with is not 3 or 4: a value being equal to 4 implicitly means that it's not 3.)

I don't know if solving the generalized case (where it's even possible to do so!) is a good use of resources, but at least it helped me internally frame desirable / undesirable behavior when I was analyzing the 96 internal hits.

AlgorithmsAreCool commented 1 month ago

Anecdata, I just got bitten by this in my code

333fred commented 2 weeks ago

Discussed in LDM: https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-11-04.md#not-and-or-patterns

hez2010 commented 2 weeks ago

What about not A and B? I think same issue applies to not and and as well.

333fred commented 2 weeks ago

@hez2010 as mentioned, we would love full subsumption enforcement, but we have no evidence of that combination being a general problem.