dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

Add `IAlternateEqualityComparer<ReadOnlySpan<byte>, string>` support to `StringComparer` #106147

Open eiriktsarpalis opened 2 months ago

eiriktsarpalis commented 2 months ago

Motivation

This benchmark seems to suggest that building a string-based IAlternateEqualityComparer for ReadOnlySpan<byte> keys can have substantial performance and usability benefits over manually converting the key to an intermediate ReadOnlySpan<char>.

We should enhance StringComparer with an IAlternateEqualityComparer<ReadOnlySpan<byte>, string> implementation. I see a couple of potential approaches we could follow:

Approach 1: Factory method in System.Text.Encoding or UTF8Encoding

public partial class Encoding
{
    public virtual EncodedStringComparer GetEncodedStringComparer(StringComparer stringComparer);
}

// Essentially just an intersection type for the two interfaces
public abstract class EncodedStringComparer : IEqualityComparer<string>,
    IAlternateEqualityComparer<ReadOnlySpan<byte>, string>
{
    // Interface implementations
}

Which can then be used as follows:

EncodedStringComparer comparer = Encoding.UTF8.GetEncodedStringComparer(StringComparer.OrdinalIgnoreCase);
Dictionary<string, int> dictionary = new(comparer);
dictionary.GetAlternateLookup<ReadOnlySpan<byte>>(); // Success

Approach 2: hardcoding UTF-8 equality comparison into StringComparer

Title says it all, we could just make the assumption that UTF-8 is the encoding most people will end up using for ROS<byte> so we just bake it into StringComparer directly:

Dictionary<string, int> dictionary = new(StringComparer.Ordinal);
dictionary.GetAlternateLookup<ReadOnlySpan<byte>>(); // UTF-8 semantics whether you like it or not

cc @stephentoub @davidfowl

dotnet-policy-service[bot] commented 2 months ago

Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.

julealgon commented 2 months ago

@eiriktsarpalis this is the first issue I come across that mentions this IAlternateEqualityComparer. I searched for it and found nothing relevant. Would you mind linking to something that provides a bit more background on that interface and its use cases?

bgrainger commented 2 months ago

@julealgon https://github.com/dotnet/core/blob/main/release-notes/9.0/preview/preview6/libraries.md#collection-lookups-with-spans

bencyoung-Fignum commented 2 months ago

I think it has to be option 1 as it's not assumed byte arrays are utf8 anywhere else except in utf8 methods?

eiriktsarpalis commented 2 months ago

@bencyoung-Fignum I think so too, but I figured it would be worth mentioning that alternative given the dominance of UTF-8.