dotnet / roslyn-analyzers

MIT License
1.58k stars 464 forks source link

Improvements to diagnostics around string comparisons/comparers in several APIs #7385

Open drewnoakes opened 3 weeks ago

drewnoakes commented 3 weeks ago

Analyzer

Diagnostic ID: CA1310

Describe the improvement

In the repos I work in, it's very common to have bugs due to omitting explicit string comparisons in code. This is a regular item that comes up in PR review, and it seems automatable. There's a semi-regular stream of bugs related to implicit string comparisons.

Describe suggestions on how to achieve the rule

We already have CA1310 which tracks use of StringComparison in some cases. It doesn't handle the majority of cases where you'd want an explicit comparer.

https://github.com/dotnet/roslyn-analyzers/blob/main/src/NetAnalyzers/UnitTests/Microsoft.NetCore.Analyzers/Runtime/SpecifyStringComparisonTests.cs

In the following code, only 3 of the 28 expressions that could accept a StringComparison/IComparer<string>/IEqualityComparer<string> via an overload trigger a diagnostic:

using System.Collections.Immutable;

string s1 = "Hello";
string s2 = "Foo";

// These expressions produce CA1310

_ = string.Compare(s1, s2) == 0; // CA1310
_ = s1.StartsWith(s2); // CA1310
_ = s1.EndsWith(s2); // CA1310

// None of the rest of these expressions produce CA1310

_ = s1 == s2;
_ = s1.Equals(s2);
_ = string.Equals(s1, s2);
_ = s1.Contains('a');
_ = s1.IndexOf('a');
_ = s1.IndexOf('a');

List<string> names = ["Alice", "Bob"];

_ = names.ToHashSet();
_ = names.ToImmutableHashSet();
_ = names.Distinct();
_ = names.ToImmutableSortedSet();
_ = names.Contains(s1);
_ = names.IndexOf(s1);
_ = names.LastIndexOf(s1);
_ = names.Except([s1]);
_ = names.GroupJoin([s1], s => s, s => s, (s, ss) => s);

List<Person> people = [new Person("Alice", 30), new Person("Bob", 25)];

_ = people.ToDictionary(p => p.Name);
_ = people.ToDictionary(p => p.Name, p => p.Age);
_ = people.DistinctBy(p => p.Name);
_ = people.GroupBy(p => p.Name);
_ = people.CountBy(p => p.Name);
_ = people.ToLookup(p => p.Name);
_ = people.ToImmutableDictionary(p => p.Name);
_ = people.ToImmutableSortedDictionary(p => p.Name, p => p.Age);
_ = people.AggregateBy(p => p.Name, n => n, (acc, n) => acc + n);

record Person(string Name, int Age);

Additional context

I'm not sure that these are just globalization issues. In my experience, it's reliability/correctness. For example, when working with file paths on Windows, you must make sure strings are compared in a case-insensitive fashion. However, it's very easy to have inconsistent treatment of path strings via any of the above APIs.

Rather than baking in knowledge of all these APIs, perhaps it's enough to look for cases where an overload exists that accepts an additional argument of type StringComparison, IComparer<string> or IEqualityComparer<string> (similar to how "forward cancellation token" analyzers work).

This issue causes our team enough trouble that I'm motivated to fix this, but would like to understand whether it makes sense to extend the current diagnostic, or to add another. Personally, I would vote for extending but want to read the room before starting on this.

drewnoakes commented 2 weeks ago

Another example I saw today, which is potentially harder to identify in a general-purpose analyzer.

List<string> names = ["a", "b"];
names.Remove("A"); // implicitly using default comparer