dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

Proposal: String.Contains(char) #24068

Closed danmoseley closed 4 years ago

danmoseley commented 6 years ago

Rationale and Usage

Finding a character in a string is a fairly common primitive operation. Unfortunately we lead users to the pit of failure because mystring.Contains(char) will bind to the Linq version, which can easily be 20x slower than mystring.IndexOf(char) != -1. By adding these to string, the next recompile will give a performance improvement.

Proposed API

    public sealed partial class String : System.Collections.Generic.IEnumerable<char>, System.Collections.IEnumerable, System.IComparable, System.IComparable<string>, System.IConvertible, System.IEquatable<string>, System.ICloneable
    {
        public bool Contains(char value) { throw null; }
        public bool Contains(char value, StringComparison comparisonType) { throw null; }
        public bool Contains(string value) { throw null; }  // already exists 
        public bool Contains(string value, StringComparison comparisonType) { throw null; } // already exists
        public int IndexOf(char value, StringComparison comparisonType) { throw null; } // to implement above
    }

the implementations will simply be

public bool Contains(char value)
{
    return IndexOf(value) != -1;
}

public bool Contains(char value, StringComparison comparisonType)
{
    return IndexOf(value, comparisonType) != -1;
}

Microbenchmark

Searching in 10 and 1000 character strings:

    Method |       Mean |     Error |    StdDev |
---------- |-----------:|----------:|----------:|
 LinqShort | 142.750 ns | 0.4997 ns | 0.4430 ns |
  LinqLong | 180.121 ns | 0.4741 ns | 0.3428 ns |
 FastShort |   7.575 ns | 0.0196 ns | 0.0174 ns |
  FastLong |   8.508 ns | 0.0403 ns | 0.0377 ns |

Variations

The StringComparison overload is in order to search case insensitively if desired. The IndexOf overload is needed to implement it. These could be discarded as the 90% case I would expect to not take a comparison.

stephentoub commented 6 years ago

public bool Contains(char value, StringComparison comparisonType)

Why is this needed? Is it purely for Ordinal and OrdinalIgnoreCase?

return IndexOf(value, comparisonType) != -1;

Does such an IndexOf overload exist that takes a char and a StringComparison?

danmoseley commented 6 years ago

Is it purely for Ordinal and OrdinalIgnoreCase?

Yes

Does such an IndexOf overload exist that takes a char and a StringComparison?

Oops, no. I added it to the proposal. But I don't this comparison is too important, it could be set aside.

jnm2 commented 6 years ago

Just yesterday I had a use case for String.Contains(char) and I thought for sure we already had this but it was String.Contains(string, StringComparison) I was thinking of. +1.

tannergooding commented 6 years ago

If the LINQ version is so much worse, is there room for a targeted optimization to improve/special case It?

danmoseley commented 6 years ago

https://github.com/dotnet/corefx/pull/25097

tannergooding commented 6 years ago

even beyond that, why not

if (comparer == null)
{
    if (typeof(TSource) == typeof(string))
    {
        return string.IndexOf() != -1;
    }
    else
    {
        foreach (element in source) { EqualityComparer.Default.Equals() }
    }
}

IIRC, typeof(T) == typeof(...) is treated as a constant by the JIT, so doing something like the above might be worthwhile for string itself.

stephentoub commented 6 years ago

TSource would be char, not string. It could do "if (typeof(TSource) == typeof(char))" and then within that would need to do a cast/type check on the source instance to see if it was a string. That will add some overhead when TSource==char and the source isn't a string, so the question would be whether such overhead is measurable and worthwhile to optimize for the string case.

danmoseley commented 6 years ago

I believe this is ready for review.

karelz commented 6 years ago

FYI: The API review discussion was recorded - see https://youtu.be/o5JFZtgaJLs?t=1169 (1 min duration)