dotnet / roslyn-analyzers

MIT License
1.58k stars 463 forks source link

String comparison analyzer #1639

Open Therzok opened 6 years ago

Therzok commented 6 years ago

Analyzer package

Could go into Microsoft.CodeQuality.Analyzers

Analyzer

Example: String comparison can be optimized in some API usage

var str = "some string";
var otherStr = "other string";

if (otherStr.Substring (5) == str.Substring (4))
    Console.WriteLine ("success");

This code contains allocations when it could avoid that. The analyzer would highlight substring in this case, and offer to replace with:

var str = "some string";
var otherStr = "other string";

// remove the indices where no substring was used
if (str.Length - 4 == otherStr.Length - 5 && string.Compare (otherStr, 5, str, 4, str.Length - 4) == 0)
    Console.WriteLine ("success");

The reason this is more performance than substring equality is:

Possible implications:

Therzok commented 6 years ago

Now that ReadOnlySpan is out, maybe spanification could be better in this case

rbeurskens commented 3 years ago

Note that this might be considered 'micro-optimization' unless it is executed many times, considering .NET the compiler and the runtime/GC are designed to take that job out off your hands so you can focus on functionality. (if you want to you can still write unsafe blocks, invoke native C++ APIs etc..) But sure, its good to be aware to know what happens 'under the hood', but maybe not a job for the analyzer here as it could depend on the context of your code. And of course, there is no string.Equals overload where a range can be specified. If there was, I'd use it, but using String.Compare() == 0 to determine equality is not considered good practice. (as well as implicit StringComparison/CultureInfo for string equality). https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings IF you just want to use ordinal case-sensitive equality, it can, as already mentioned be further (memory) optimized, using ReadOnlySpan<T> from the System.Memory NuGet package.

if (str.Length - 4 == otherStr.Length - 5 && str.AsSpan(4).SequenceEqual(otherStr.AsSpan(5)))
    Console.WriteLine ("success");

If you can't / don't want to use the NuGet package, you could just use a for loop to compare the ranges of characters, which does not involve allocating new strings on the heap, as it only extracts the characters onto the stack one by one, which also gives the opportunity of optionally using char.ToUpper(CultureInfo) on both of them before comparing them using char.Equals() / char.operator ==

// example
const string str = "some string";
const string otherStr = "other string";
const int offset1 = 4, offset2 = 5;
bool success = str.Length - offset1 == otherStr.Length - offset2;

for (int i = 0, count = str.Length - offset1; success && i < count; i++)
{
    success = str[i + offset1] == otherStr[i + offset2]; // ordinal
    // success = char.ToUpper(str[i + offset1]) == char.ToUpper(otherStr[i + offset2]); // current culture, ignore case
    // success = char.ToUpperInvariant(str[i + offset1]) == char.ToUpperInvariant(otherStr[i + offset2]);  // invariant culture, ignore case
    if (!success)
        break;
}
if (success)
    Console.WriteLine("success");
Youssef1313 commented 1 year ago

@stephentoub Should this move to dotnet/runtime?