dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.91k stars 4.63k forks source link

[API Proposal]: SequenceCompareTo accepting a custom comparer #84838

Open YohDeadfall opened 1 year ago

YohDeadfall commented 1 year ago

Background and motivation

The MemoryExtensions type provides SequenceCompareTo methods which has just two overloads, but none of them is accepting a custom element comparer making it impossible to compare two sequences without allocating new buffers or without implementing the required overload outside of the standard library.

At the same time the type has overloads accepting a custom equality comparer for equality comparison.

If it's approved, I would like to provide an implementation for it.

API Proposal

namespace System;

public static class MemoryExtensions
{
    public static int SequenceCompareTo(this Span<T> span, ReadOnlySpan<T> other, IComparer<T>? comparer = default);

    public static int SequenceCompareTo(this ReadOnlySpan<T> span, ReadOnlySpan<T> other, IComparer<T>? comparer = default);
}

API Usage

Span<string> s1;
Span<string> s2;

int result = s1.SequenceCompareTo(StringComparer.CurrentCultureIgnoreCase);

Alternative Designs

When two input sequences are arrays, then one of them can be casted to IStructuralComparable and do comparison, but that solution isn't obvious because the interface is implemented explicitly. In addition to that CompareTo of the interface accepts a non-generic IComparer which looses type safety and potentially hurts performance (if there's no under the hood cast of the passed comparer to IComparer<T> to be used then).

If both sequences were received as spans, then to use the mentioned alternative solution ToArray must be called on the spans which leads to unwanted memory allocations.

Risks

No response

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation The `MemoryExtensions` type provides `SequenceCompareTo` methods which has just two overloads, but none of them is accepting a custom element comparer making it impossible to compare two sequences without allocating new buffers or without implementing the required overload outside of the standard library. At the same time the type has overloads accepting a custom equality comparer for equality comparison. If it's approved, I would like to provide an implementation for it. ### API Proposal ```csharp namespace System.Collections.Generic; public static class MemoryExtensions { public static int SequenceCompareTo(this Span span, Span other, IComparer? comparer = default); public static int SequenceCompareTo(this ReadOnlySpan span, ReadOnlySpan other, IComparer? comparer = default); } ``` ### API Usage ```csharp Span s1; Span s2; int result = s1.SequenceCompareTo(StringComparer.CurrentCultureIgnoreCase); ``` ### Alternative Designs When two input sequences are arrays, then one of them can be casted to `IStructuralComparable` and do comparison, but that solution isn't obvious because the interface is implemented explicitly. In addition to that `CompareTo` of the interface accepts a non-generic `IComparer` which looses type safety and potentially hurts performance (if there's no under the hood cast of the passed comparer to `IComparer` to be used then). If both sequences were received as spans, then to use the mentioned alternative solution `ToArray` must be called on the spans which leads to unwanted memory allocations. ### Risks _No response_
Author: YohDeadfall
Assignees: -
Labels: `api-suggestion`, `area-System.Memory`
Milestone: -
cucumber-sp commented 1 year ago

I don't think it's possible to add IComparer<T> overload to existing API without losing performance. Right now it uses Avx2 and per-byte comparison for values. Using any comparer, even default, will make it much slower.

YohDeadfall commented 1 year ago

Right now it uses Avx2 and per-byte comparison for values. Using any comparer, even default, will make it much slower.

Don't see it:

https://github.com/dotnet/runtime/blob/a0e23f41e8f40db72b6203855d9858aadb7193ee/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs#L1288-L1305

If there's no comparer or it is a default one, then fallback to an overload without a comparer as it happens for equality.

https://github.com/dotnet/runtime/blob/932efb5824f10622664feebceb486b032b615ccd/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs#L2201-L2203

Neme12 commented 1 year ago

Is there anything preventing this from being ready to review?

Neme12 commented 1 year ago

This is the proposal next to the existing methods, for reference:

 namespace System;

 public static partial class MemoryExtensions
 {
     public static bool SequenceEqual<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> other) where T : IEquatable<T>?;
     public static bool SequenceEqual<T>(this Span<T> span, ReadOnlySpan<T> other) where T : IEquatable<T>?;
     public static bool SequenceEqual<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> other, IEqualityComparer<T>? comparer = null);
     public static bool SequenceEqual<T>(this Span<T> span, ReadOnlySpan<T> other, IEqualityComparer<T>? comparer = null);
     public static int SequenceCompareTo<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> other) where T : IComparable<T>?;
     public static int SequenceCompareTo<T>(this Span<T> span, ReadOnlySpan<T> other) where T : IComparable<T>?;
+    public static int SequenceCompareTo(this ReadOnlySpan<T> span, ReadOnlySpan<T> other, IComparer<T>? comparer = null);
+    public static int SequenceCompareTo(this Span<T> span, ReadOnlySpan<T> other, IComparer<T>? comparer = null);
 }
Neme12 commented 1 year ago

@YohDeadfall You should change the Span<T> overload to take ReadOnlySpan<T> as the second argument, not Span<T>, just like the existing overload always take a ReadOnlySpan<T> as the second argument (and like I showed in the comment above). There's no need to require it to be mutable. The only reason there are overloads for the first argument being both readonly and non-readonly is that when it only takes ReadOnlySpan<T>, the extension method wouldn't show up on Span<T>. But this is not a problem for the other arguments.

Neme12 commented 1 year ago

Also, MemoryExtensions is in System, not System.Collections.Generic.

YohDeadfall commented 1 year ago

@Neme12, thank you! Fixed the proposal.