Proposal: Add Sort(...) extension methods for Span<T>

nietras commented 7 years ago

Currently, there is no way to sort native or fixed memory (e.g. coming from a pointer) in .NET, this proposal intends to fix that by adding sorting extension methods to Span<T>, but also proposes some different overloads than seen on Array to allow for inlined comparisons via the possibility to use value type comparers.

Proposed API

Add a set of Sort extension methods to Span<T> in SpanExtensions:

public static class SpanExtensions
{
     public static void Sort<T>(this Span<T> span);

     public static void Sort<T, TComparer>(this Span<T> span, TComparer comparer) 
        where TComparer : IComparer<T>;

     public static void Sort<T>(this Span<T> span, System.Comparison<T> comparison);

     public static void Sort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items);

     public static void Sort<TKey, TValue, TComparer>(this Span<TKey> keys, 
        Span<TValue> items, TComparer comparer) 
        where TComparer : IComparer<TKey>;

     public static void Sort<TKey, TValue>(this Span<TKey> keys, 
        Span<TValue> items, System.Comparison<TKey> comparison);
}

Rationale and Usage

Provide a safe yet fast way of sorting of any type of contiguous memory; managed or unmanaged.

Sorting Native Memory

var span = new Span<int>(ptr, length);
span.Sort(); // Sort elements in native memory

Sorting with Inlineable Comparer

struct ReverseComparer : IComparer<int>
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public int Compare(int a, int b)
    {
        if (a == b) { return 0; }
        if (a > b) { return -1; }
        return 1;
    }
}

Span<int> span = GetSomeSpan();
// Sort elements, in reverse order with inlined Compare,
// without heap allocation for comparer
span.Sort(new ReverseComparer());

Sorting based on Lambda

Span<int> span = GetSomeSpan();
// Sort elements, in reverse order with lambda/delegate
span.Sort((a, b) => a == b ? 0 : (a > b ? -1 : 1));

Sorting Compound Type

struct Compound
{
    public float FeatureValue;
    public int FeatureIndex;
    public object Payload;
}

struct InlineableFeatureValueComparer : IComparer<Compound>
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public int Compare(Compound a, Compound b)
    {
        if (a.FeatureValue == b.FeatureValue) { return 0; }
        if (a.FeatureValue < b.FeatureValue) { return -1; }
        return 1;
    }
}

var span = new Span<Compound>();

span.Sort();

// Inlineable struct comparer
var comparer = new InlineableFeatureValueComparer();
span.Sort(comparer);

// Lambda comparer
span.Sort((a, b) => a.FeatureValue == b.FeatureValue ? 0 : 
    (a.FeatureValue < b.FeatureValue ? -1 : 1));

The argumentation for adding this is:

To increase the efficiency of code doing sorting and prevent people from reinventing the wheel.
Allow performance optimizations depending on memory type and contents.
Allow sorting on contiguous memory of any kind.

Use Cases

In almost any domain where a high volume of data is processed with sorting being one operation and memory management (e.g. memory recycling, buffer pooling, native memory) is a must to achieve high performance with minimal latency, these sorts would be useful. Example domains are database engines (think indexing), computer vision, artificial intelligence etc.

A concrete example could be in the training of Random Forests some methods employ feature sorting (with indeces) to find decision boundaries on. This involves a lot of data and data that can originate from unmanaged memory.

Open Questions

An important question regarding this proposal is whether the pattern with generic parameter TComparer (e.g. constrained to where TComparer : IComparer<T>) is a pattern that can be approved. This pattern allows for inlineable comparers at the cost of increased code size, if no value type comparers are used, there should be no difference. This pattern is also used in the proposal for BinarySearch in https://github.com/dotnet/corefx/issues/15818

The API relies on being able to depend upon System.Collections.Generic, could this be an issue?

@karelz @jkotas @jamesqo

Updates

UPDATE 1: Change API to be defined as extension methods. UPDATE 2: Add compounded type usage. UPDATE 3: Add link to BinarySearch and point on the pattern used.

Existing Sort APIs

A non-exhaustive list of existing sorting APIs is given below for comparison.

`Array.Sort` Static Methods

Found in ref/System.Runtime.cs

public static void Sort(System.Array array) { }
public static void Sort(System.Array keys, System.Array items) { }
public static void Sort(System.Array keys, System.Array items, System.Collections.IComparer comparer) { }
public static void Sort(System.Array keys, System.Array items, int index, int length) { }
public static void Sort(System.Array keys, System.Array items, int index, int length, System.Collections.IComparer comparer) { }
public static void Sort(System.Array array, System.Collections.IComparer comparer) { }
public static void Sort(System.Array array, int index, int length) { }
public static void Sort(System.Array array, int index, int length, System.Collections.IComparer comparer) { }
public static void Sort<T>(T[] array) { }
public static void Sort<T>(T[] array, System.Collections.Generic.IComparer<T> comparer) { }
public static void Sort<T>(T[] array, System.Comparison<T> comparison) { }
public static void Sort<T>(T[] array, int index, int length) { }
public static void Sort<T>(T[] array, int index, int length, System.Collections.Generic.IComparer<T> comparer) { }
public static void Sort<TKey, TValue>(TKey[] keys, TValue[] items) { }
public static void Sort<TKey, TValue>(TKey[] keys, TValue[] items, System.Collections.Generic.IComparer<TKey> comparer) { }
public static void Sort<TKey, TValue>(TKey[] keys, TValue[] items, int index, int length) { }
public static void Sort<TKey, TValue>(TKey[] keys, TValue[] items, int index, int length, System.Collections.Generic.IComparer<TKey> comparer) { }

`List<T>.Sort` Member Methods

Found in ref/System.Collections.cs

public partial class List<T> : System.Collections.Generic.ICollection<T>, System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IList<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.Generic.IReadOnlyList<T>, System.Collections.ICollection, System.Collections.IEnumerable, System.Collections.IList
{
    public void Sort() { }
    public void Sort(System.Collections.Generic.IComparer<T> comparer) { }
    public void Sort(System.Comparison<T> comparison) { }
    public void Sort(int index, int count, System.Collections.Generic.IComparer<T> comparer) { }
}

LINQ `OrderBy` Extension Methods

Found in ref/System.Linq.cs

public static System.Linq.IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(this System.Collections.Generic.IEnumerable<TSource> source, System.Func<TSource, TKey> keySelector) { throw null; }
public static System.Linq.IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(this System.Collections.Generic.IEnumerable<TSource> source, System.Func<TSource, TKey> keySelector, System.Collections.Generic.IComparer<TKey> comparer) { throw null; }
public static System.Linq.IOrderedEnumerable<TSource> OrderByDescending<TSource, TKey>(this System.Collections.Generic.IEnumerable<TSource> source, System.Func<TSource, TKey> keySelector) { throw null; }
public static System.Linq.IOrderedEnumerable<TSource> OrderByDescending<TSource, TKey>(this System.Collections.Generic.IEnumerable<TSource> source, System.Func<TSource, TKey> keySelector, System.Collections.Generic.IComparer<TKey> comparer) { throw null; }

jkotas commented 7 years ago

Or perhaps as extension methods.

These should be extension methods. They are in same category as the other similar methods being added: https://github.com/dotnet/corefx/pull/15080 https://github.com/dotnet/corefx/pull/15123

cc @KrzysztofCwalina

nietras commented 7 years ago

These should be extension methods.

Ok, would this not mean any optimizations specific to native/managed memory would not be possible as directly as when they are members? Not sure it has such big a relevance since sorting can be done on refs, but not everything is possible with refs as with pointers. Anyway, if you think this should not be an issue I will update the proposal.

nietras commented 7 years ago

Another concern, is generic type inference which in C# is not always working so well with many generic parameters... not that I can think of a specific case for the proposed API though, but in the face of implicit conversions or similar it is often lacking.

jamesqo commented 7 years ago

@nietras IMHO, I don't know if this is really a needed API. What scenarios are there in which someone would want to sort a Span? It's mostly going to be used to hold byte/char buffers, where sorting won't make much sense. Also, things like equality comparers are high-level and don't really fit in with low-level constructs like Span.

nietras commented 7 years ago

What scenarios are there in which someone would want to sort a Span?

@jamesqo plenty! 😃 Ranging from database engines, computer vision to artificial intelligence. In the latter cases, the types will also range fully from byte to double. And in these fields memory recycling, buffer pooling is paramount to archieving reliable performance. And performance in general important so inlining the compare is important too.

things like equality comparers are high-level

Not sure I get that, but is the issue with the "default" comparers needed for Sort() without parameters? I could live with removing the overloads that do not take a TComparer if that is the concern?

svick commented 7 years ago

public void Sort<TKey, TValue, TComparer>(this Span<TKey> keys, Span<TValue> items, TComparer comparer) where TComparer : IComparer<T>;
public void Sort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items, System.Comparison<T> comparison); // Convenience overload

Nitpick: the Ts in the two lines above should be TKey:

public void Sort<TKey, TValue, TComparer>(this Span<TKey> keys, Span<TValue> items, TComparer comparer) where TComparer : IComparer<TKey>;
public void Sort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items, System.Comparison<TKey> comparison); // Convenience overload

nietras commented 7 years ago

@svick thanks I also forgot static

jamesqo commented 7 years ago

@nietras

Ranging from database engines, computer vision to artificial intelligence. In the latter cases, the types will also range fully from byte to double. And in these fields memory recycling, buffer pooling is paramount to archieving reliable performance. And performance in general important so inlining the compare is important too.

Do you have a concrete example where a sort function would be needed, though? I think the 95% use case for this API would be passing around byte buffers/char buffers. In those scenarios, sorting isn't a function that would be needed, and would only be more API bloat.

Not sure I get that, but is the issue with the "default" comparers needed for Sort() without parameters? I could live with removing the overloads that do not take a TComparer if that is the concern?

The issue, actually, is with the overloads with the comparers. Making them generic on the type of the comparer won't improve perf because everyone passes around comparers as IEqualityComparer<T>. So you will still have virtual calls.

nietras commented 7 years ago

concrete example where a sort function would be needed, though?

@jamesqo I tried giving a more concrete in the proposal regarding machine learning.

95% use case for this API would be passing around byte buffers/char buffers

If .NET Core and Span<T> only target audience is web developers, where shuffling bytes to/from text/chars, then perhaps you are right. But IMHO it is a very limited target audience and misses the fact that in a cloud/mobile first world that is going to be infused with artificial intelligence everywhere, handling massive amounts of data/measurements is a must. This includes all data types, byte to double. Pure sales talk, uff 🙄

because everyone passes around comparers as IEqualityComparer

I don't and never have. 😉 Passing comparers around is not something we do and it does not have any impact on the design of this API. It still works as a "non-generic" API and if it is only used with reference type comparers there will be no code bloat or anything. So I do not see the downside here, the overloads are there for convenience and flexibility just as with the existing APIs incl. Array.Sort.

The implementation can forward to the same single implementation, for example. However, by having the possibility for using a value type comparer we allow developers that need the perf and understand that this will cause code to be generated specifically for that case to do what they need. No virtual calls. In the domain of numerical processing that is what we need.

KrzysztofCwalina commented 7 years ago

We could do Sort(TComparer comparer) where TComparer : IComparer. This would be devirtualized.

jamesqo commented 7 years ago

@KrzysztofCwalina It will not unless you pass in a custom struct comparer. For example, Sort(Comparer<T>.Default) will have TComparer == Comparer<T> and you will be making virtual calls to Comparer<T>.Compare.

jamesqo commented 7 years ago

There is another thing-- I don't think the overloads accepting TComparer will even compile. See here. I ran into this issue a few months ago when I was trying to use generics in a similar fashion.

nietras commented 7 years ago

I don't think the overloads accepting TComparer will even compile

Yes, it does 😄 Check out Sort Extensions examples with the following code:

using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;

namespace System
{
    public struct Span<T> { }

    public static class SpanExtensions
    {
        public static void Sort<T>(this Span<T> span)
        { }

        public static void Sort<T, TComparer>(this Span<T> span, TComparer comparer)
           where TComparer : IComparer<T>
        { }

        public static void Sort<T>(this Span<T> span, System.Comparison<T> comparison)
        { }

        public static void Sort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items)
        { }

        public static void Sort<TKey, TValue, TComparer>(this Span<TKey> keys,
           Span<TValue> items, TComparer comparer)
           where TComparer : IComparer<TKey>
        { }

        public static void Sort<TKey, TValue>(this Span<TKey> keys,
           Span<TValue> items, System.Comparison<TKey> comparison)
        { }
    }

    public static class Usage
    {
        struct ReverseComparer : IComparer<int>
        {
            [MethodImpl(MethodImplOptions.AggressiveInlining)]
            public int Compare(int a, int b)
            {
                if (a == b) { return 0; }
                if (a > b) { return -1; }
                return 1;
            }
        }

        public static void SingleSpan()
        {
            var span = new Span<int>();

            span.Sort();

            // Inlineable struct comparer
            var reverseComparer = new ReverseComparer();
            span.Sort(reverseComparer);

            // Lambda comparer
            span.Sort((a, b) => a == b ? 0 : (a > b ? -1 : 1));
        }

        public static void TwoSpans()
        {
            var keys = new Span<int>();
            var items = new Span<double>();
            keys.Sort(items);

            // Inlineable struct comparer
            var reverseComparer = new ReverseComparer();
            keys.Sort(items, reverseComparer);

            keys.Sort(items, (a, b) => a == b ? 0 : (a > b ? -1 : 1));
        }
    }
}

This compiles fine, ReverseComparer will be inlined. If people use Comparer<T>.Default the behaviour is the same as for Array.Sort and I see no issue with that, i.e. it will use it as Comparer<T> and there will be a virtual call. If necessary we can do some of the same checks that I think happen in Array.Sort to check if it is default comparer e.g. for int then use optimized int sort. Just as the non-generic Array.Sortdoes.

UPDATE: @jamesqo in your example you have:

public void M<T, TEnumerable>(TEnumerable e) where TEnumerable : IEnumerable<T> { }

which means you expected it to infer T from TEnumerable alone, which C# doesn't do... well at least not as far as I know. Another issue is generic type inference in the face of implicit conversion, this also often chokes the compiler. That will might show up when I get to BinarySearch... on ReadOnlySpan<T>.

joperezr commented 7 years ago

@KrzysztofCwalina is this ready for review?

nietras commented 7 years ago

@joperezr since @KrzysztofCwalina hasn't replied, I think it is ready for review.

KrzysztofCwalina commented 7 years ago

which one is the final proposal? The APIs at the top of this issue, or there are updates described inline?

nietras commented 7 years ago

Yes, the extension methods.

On Feb 1, 2017 3:29 PM, "Krzysztof Cwalina" notifications@github.com wrote:

which one is the final proposal? The APIs at the top of this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/corefx/issues/15329#issuecomment-276670744, or mute the thread https://github.com/notifications/unsubscribe-auth/AKTG73siLLfb5BrZS48wAc146jtJU9oCks5rYJbIgaJpZM4LoiKj .

jkotas commented 7 years ago

terrajobst commented 7 years ago

The scenario and proposed overloads all make sense. Minor issue: the one that takes two spans (keys and items) should call the items values as that's what array did. So it should be:

public static class SpanExtensions
{
     public static void Sort<TKey, TValue>(this Span<TKey> keys, Span<TValue> values);
     public static void Sort<TKey, TValue, TComparer>(this Span<TKey> keys,  Span<TValue> values, TComparer comparer)  where TComparer : IComparer<TKey>;        
     public static void Sort<TKey, TValue>(this Span<TKey> keys,  Span<TValue> values, System.Comparison<TKey> comparison);
}

We should discuss the extension method pattern for span holistically -- I don't think that this a good idea for our customers. It introduces a somewhat weird type (SpanHelpers) we no additional benefit. What is the reason these should be extension methods? (filed dotnet/corefx#15914 for it)

karelz commented 7 years ago

Approved and up-for-grabs, anyone wants to take it? @nietras?

nietras commented 7 years ago

should call the items values

@terrajobst if you look at the ref sources I included at the bottom of the proposal, these are in fact called items, this is where I took it from. E.g.:

public static void Sort<TKey, TValue>(TKey[] keys, TValue[] items) { }

Are there conflicting names, or are the ref sources not correct?

We should discuss the extension method pattern for span holistically -- I don't think that this a good idea for our customers.

The original proposal was for member methods, but I guess there are some implementation issues related to fast/slow span. @jkotas should answer that. I would have preferred member methods too.

terrajobst commented 7 years ago

@nietras

You're correct. I looked at apisof.net but apparently I suck at reading because it's called items there as well. Never mind then; items it is :-)

nietras commented 7 years ago

Approved and up-for-grabs, anyone wants to take it? @nietras?

@karelz I would love to to do this. I even had this idea to base this on "Multi-Pivot Quicksort: Theory and Experiments" (or newer e.g. "comparison-optimal dual-pivot quicksort algorithms" see http://itu.dk/people/maau/additional/2016_vortrag_arco_QS.pdf) using a multi-pivot quick-sort algorithm (point being it is more important to have fewer cache misses than comparisons).

But given the scope of this, testing alone, perf etc., I do not think I will have time to do this within a reasonable time frame. No doubt, it would also be best to simply port the existing sort code and tests to SpanExtensions. If I get time and no one has started this, I will let you know.

nietras commented 7 years ago

Approved and up-for-grabs, anyone wants to take it? @nietras?

@karelz I would love to to do this. I even had this idea to base this on "Multi-Pivot Quicksort: Theory and Experiments" (or newer e.g. "comparison-optimal dual-pivot quicksort algorithms" see http://itu.dk/people/maau/additional/2016_vortrag_arco_QS.pdf) using a multi-pivot quick-sort algorithm (point being it is more important to have fewer cache misses than comparisons).

But given the scope of this, testing alone, perf etc., I do not think I will have time to do this within a reasonable time frame. No doubt, it would also be best to simply port the existing sort code and tests to SpanExtensions. If I get time and no one has started this, I will let you know.

karelz commented 7 years ago

Sounds good, let's see if anyone else wants to help with the work in the meantime ...

nietras commented 6 years ago

@karelz you can assign this to me if you like. I will start working on this tentatively, it will probably take some time, due to writing tests mostly I think. First goal is simply porting the existing Array.Sort code with some minor modifications.

LYP951018 commented 6 years ago

And then we could just use Span<T>.Sort as implementation for Array and List<T>?

GrabYourPitchforks commented 6 years ago

@nietras How are things going with this? We recently forked for 2.1-preview1, but there may still be time to get this in for the next preview. If you find yourself pressed for time, no worries, let us know and we can reassign. Thanks! :)

nietras commented 6 years ago

How are things going with this?

@GrabYourPitchforks things are progressing fine. I have ported all the code, basic tests pass, but I'm still trying to iron out some perf issues after I consolidated code to have both with and without items in one code base. Before that perf was on par or even better than coreclr even for the C++ optimized types.

The code can actually be found at https://github.com/DotNetCross/Sorting/blob/span-sort/src/DotNetCross.Sorting/SpanSortHelpers.cs I am working on it in that repo. Any feedback is welcomed.

I'll try to make a preminary PR for this soon, I need some feedback anyway. Especially regarding testing etc.

It would be great if this could be part of next preview.

And then we could just use Span.Sort as implementation for Array and List?

@LYP951018 that question is a bit tricky due to how the coreclr and corefx code is structured. Array.Sort is part of coreclr for example. Someone else might be better at answering whether in the future it might be possible to use this version for all sorts.

jkotas commented 6 years ago

consolidated code to have both with and without items in one code base

I do not think you need to do that. It is fine to have two copies - one without and one with items.

nietras commented 6 years ago

I do not think you need to do that. It is fine to have two copies - one without and one with items.

@jkotas I thought it would be less maintenance and with some typeof == typeof tricks without perf issues. But yes I may have to revert those changes then. It just means a lot of duplicated code. It's almost 1000 lines, and it also impacts type specializations I have done.

thorgeirk11 commented 6 years ago

Are there any workarounds to apply sorting on a Span without having to copy it to an array?

This is so ugly:

   var ohNoAnArray = awesomeSpan.ToArray();
   Array.Sort(ohNoAnArray, (p1, p2) => p1.CompareTo(p2)));
   ohNoAnArray.CopyTo(awesomeSpan);

nietras commented 6 years ago

@thorgeirk11 this proposal of course addresses this, unfortunately my PRs (https://github.com/dotnet/corefx/pull/26859 and https://github.com/dotnet/coreclr/pull/16986) for adding these got rejected/closed. Does not seem to have been added by anyone else after.

However, the implementation works and all tests pass of this, the main issue was around security I think. In https://github.com/dotnetcross/sorting the same implementation can be found. I have not had time to finalize this and make it available in a nuget package though, as a complement to the -- hopefully at some point -- implementation of this proposal in .NET Core proper.

shartte commented 5 years ago

Regarding use cases: Maybe I am misunderstanding this, but I was trying to stackalloc a Span of simple structs, then sort it, but this seems impossible to do without this issue being solved. Is that correct?

Example (which doesn't compile):

struct BoneAttachment {
  int Bone;
  float Weight;

  private sealed class WeightRelationalComparer : IComparer<BoneAttachment>
  {
    public int Compare(BoneAttachment x, BoneAttachment y)
    {
      return y.Weight.CompareTo(x.Weight);
    }
  }

  public static IComparer<BoneAttachment> WeightComparer { get; } = new WeightRelationalComparer();
}

Span<BoneAttachment> attachments = stackalloc BoneAttachment[16];
// ... fill with data
attachments.Sort(BoneAttachment.WeightComparer); // <--- This doesn't exist

nietras commented 5 years ago

@shartte yes only when this proposal has been implemented will you be able to do that.

I did a PR for this in https://github.com/dotnet/corefx/pull/26859 but this was rejected.

In https://github.com/DotNetCross/Sorting I have this in separate source form but haven't had time to clean this up and package as a nuget package for general use.

EamonNerbonne commented 5 years ago

If people are still waiting for things like this; I wrote a nuget package that's similar in spirit: Anoprsst. However, I didn't try to keep the API similar to Array.Sort since I was aiming to avoid perf-pitfals (i.e. no virtual calls, ever). In particular that means you can do someSpan.SortIComparableUsing().Sort() but that's constrained to sorting value-types only, otherwise you need to use someSpan.SortUsing(new MyOrdering()).Sort() with a struct ordering. The serial implementation appears to outperform Array.Sort in all cases, sometimes by a large margin; and for large arrays it uses parallel quick sort for further speedup.

stephentoub commented 5 years ago

Completed in https://github.com/dotnet/corefx/pull/42443

Sergio0694 commented 4 years ago

I have a question regarding the MemoryExtensions.Sort<T, TComparer>(...) APIs, I've looked at both the source and this issue (and the related ones) but I'm confused about the current implementation. I might just be missing something here so in that case please pardon my ignorance 😅

I've seen both in this issue and the other one on the same subject, that the idea was to add overloads with a generic comparer so that when users passed that with a value type implementing IComparer<T>, the reified generics would come into play allow the JIT to just inline the whole comparer, which is great. I'm not actually seeing this in the current implementation though, is this just a placeholder and the actual API is still work in progress on this front? I mean:

https://github.com/dotnet/runtime/blob/6b5850fa7cf476b66be9d01a3ed05f4484b01d3a/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs#L1810-L1817

Here the whole comparer is just boxed and passed as an IComparer<T>, so isn't the whole point of having a generic comparer lost along the way? And if this is indeed just a placeholder that right now falls back to the normal "slow" implementation, then why was this added before the actual API was available - is it just to allow users to write code around this API, and just enabling this in the future with a runtime update giving a performance boost to those that have been using it already?

nietras commented 4 years ago

isn't the whole point of having a generic comparer lost along the way?

@Sergio0694 yes. That feature is not part of the current implementation, and was as you mention one of the main things about my proposal to allow inlineable value type comparers. I don't know the exact reasons not to include this but the downside is code size.

Shameless plug, I have this feature implemented in DotNetCross.Sorting. Just haven't finished it yet for nuget publication. One point with the nuget library was also to have the new sorting API available for "older" runtimes.

stephentoub commented 4 years ago

From my perspective, we should either remove the generic parameter or avoid the boxing. My preference is the former; I don't think having it be generic over the comparer type adds enough benefit to counteract the cost and complexity (even the array-sorting overload that takes an interface allocates an object).

@jkotas, do you have an opinion?

jkotas commented 4 years ago

@stephentoub I agree with your assessment.

Sergio0694 commented 4 years ago

Oh I'm sorry to hear that, I thought the direction with a heavier usage of generics to squeeze out more performance was actually pretty great. Just as an idea, if the concern is the increased code size with the two implementations (the generic one, and the one using Comparison<T>), wouldn't it be possible to introduce a new (internal) type like this one, to wrap IComparer<T> values:

public readonly struct ValueComparer<T> : IComparer<T>
{
    private readonly Comparison<T> comparer;

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public ValueComparer(IComparer<T> comparer)
    {
        this.comparer = comparer.Compare;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public int Compare(T x, T y)
    {
        return comparer(x, y);
    }
}

And then make the ArraySortHelper<T> type always be generic in both the item type and the comparison type? This could use either the user-specific comparer if one is provided, or in the other paths for all the current APIs it'd just wrap the supplied Comparison<T> delegate into an instance of this ValueComparer<T> struct and just pass that around? This would allow to have a single implementation that would work in both cases. Shouldn't the performance improvement in the case that a generic comparer is supplied by the user be worth the time to make this change?

As another example, in the case where T : IComparable<T>, users could just pass this:

public readonly struct ImplicitValueComparer<T> : IComparer<T>
    where T : IComparable<T>
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public int Compare(T x, T y)
    {
        return x.CompareTo(y);
    }
}

And it would basically be free performance, as the sort algorithm could then always inline this and skip one indirect call that it's not doing for each single comparison, to that Comparison<T> instance.

As an example, we're now using this same approach in ImageSharp as well (here) (I actually got the original idea for this from bepuphysics2), and I've added similar APIs to the Microsoft.Toolkit.HighPerformance package as well (here). In my tests, there was a very decent performance improvement when the supplied "value delegate" was small enough to be inlined, which I guess would be desireable especially for users trying to sort a Span<T> in the first place?

Just a thought, I understand there might be other factors here I'm not properly considering 😊

nietras commented 4 years ago

@Sergio0694 I tried to consolidate the sorting code based on the patterns you describe (going down the value type comparers rabbit hole futilely), but sadly the performance of this simply isn't there due to limitations in the runtime currently (inlining, CG, etc.). The canonical representation of generic types for reference type generic parameters is an issue too (the main issue, while the pattern works wonders for value types it doesn't for reference types, I have spent an exorbitant amount of time looking into this 🤦‍♂️ ). Haven't completely given up consolidating some here with the advent of function pointers in C# 9, but the reality is I had to create 4 versions of sorting code to get decent perf. Or in some cases 1.66x better perf than what is in .NET 5 now. At least last I benchmarked.

I don't think having it be generic over the comparer type adds enough benefit to counteract the cost and complexity

@stephentoub @jkotas I, of course 😃 , disagree with this. I specifically designed the API around being able to define value type comparers for inlineable custom comparisons, which can have significant performance improvements for custom scenarios. Not all of users are sorting primitives only. If you remove it from the API, this could tie the API down for the future.

On the other hand, if I finish DotNetCross.Sorting this could be available outside of BCL and hence fulfil this requirement. But it is a heavy burden to bear 😅 Fast and flexible sorting is in my view a fundamental part of a great platform.

stephentoub commented 4 years ago

If you remove it from the API, this could tie the API down for the future.

If we have:

public static void Sort<T>(Span<T> items, IComparer<T>? comparer)

now, what prevents us from adding:

public static void Sort<T, TComparer>(Span<T> items, TComparer comparer) where TComparer : IComparer<T>?

as an overload later?

Right now the overload is a pit of failure: someone tries to use a struct TComparer thinking it'll avoid allocations and enable devirtualization and inlining, and not only does it not enable devirtualization and inlining, it actually results in two allocations.

nietras commented 4 years ago

what prevents us from adding

Nothing, except of course poor overload resolution 😅 Especially, if implicit conversions are considered. But that may not be a big concern. I hope you will keep it though and that we could find a solution to the implementation issue. Either using source generator or something. Would it be a problem to have multiple versions of sorting code if they were generated? Only one version to maintain? Is the main objection to the API the runtime code generation cost e.g. instruction as bytes?

nietras commented 4 years ago

someone tries to use a struct TComparer thinking it'll avoid allocations and enable devirtualization and inlining, and not only does it not enable devirtualization and inlining, it actually results in two allocations

@stephentoub it's pretty bad now yes 😅. I am not a fan of the forced allocations going on for sure. Including for a reference type comparer since this gets converted to a delegate. This can be solved simply by adding appropriate code for it, though. It's not hard to add, it just another "variant".

nietras commented 4 years ago

@stephentoub I would also encourage you to look at float sorting performance in .NET 5 compared to the native in .NET Core 3.1. As far as I can tell it is not on par, due to problems with inlining CompareTo. That's why my version gets a decent perf boost compared to .NET 5. Example given below, not exactly 1.66x in this but definitely significantly faster. Note this was preview.2 though. 😃


BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.720 (1909/November2018Update/19H2)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.2.20176.6
  [Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.16006, CoreFX 5.0.20.16006), X64 RyuJIT

Platform=X64  Runtime=.NET Core 5.0  Toolchain=InProcessNoEmitToolchain  
IterationCount=9  LaunchCount=1  RunStrategy=Monitoring  
UnrollFactor=1  WarmupCount=3

Method	Filler	Length	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
CLR_	Random	1000000	288.72 ms	0.558 ms	0.332 ms	1.00	0.00	-	-	-	-
DNX_	Random	1000000	185.84 ms	4.286 ms	2.550 ms	0.64	0.01	-	-	-	-

nietras commented 4 years ago

Of course this perf boost is nothing compared to @damageboy VxSort. Nothing compares to that 🚀

stephentoub commented 4 years ago

Nothing, except of course poor overload resolution

In what way?

I hope you will keep it though and that we could a solution to the implementation issue.

We're close to being done with .NET 5. What solution are you proposing to implement? The current API is the worst of all relevant worlds: it has a more complicated and intimidating signature, it doesn't provide the intended benefits, and it leads one to use a coding style (using a struct and not caching an instance) that leads to worse performance than if a simple lambda were just used. You can also get the same advanced benefits by sorting structs with custom IComparable<T> implementations (though I understand that represents only a portion of the scenarios trying to be optimized.)

It's not hard to add, it just another "variant"

Yes, a lot of duplicated code, when we're already facing more duplication than we'd like.

If you really believe this is critical, show us what the complete solution looks like.

I would also encourage you to look at float sorting performance in .NET 5 compared to the native in .NET Core 3.1

That is unrelated to this API; you didn't show your code, but the perf should be comparable between 3.1 and 5 without a supplied comparer and just using the type's IComparable<T> implementation. So if it's not, please open a separate issue.

nietras commented 4 years ago

show us what the complete solution looks like.

The code is here https://github.com/DotNetCross/Sorting/tree/master/src/DotNetCross.Sorting/Implementations and is a mayhem of 2 x 4 variant == 8 copies. I have pointed to this before. Note this code was explicitly split up to show this in detail and to show the parts of the sorting. Probably not to every bodies liking. No reason this couldn't be generated, although I see perf issues for special cases. Note also these implementations are based on a "pure" refs and Unsafe code since this has to work well on old runtimes.

I still think there is a "solution" to the canonical representation issue, but the JIT has too many issues around inlining (doesn't inline methods with delegates for example 🤷‍♂️) which I have commented on in other places. So I have given up on that honestly.

I'm fine with keeping this in a separate repo/nuget package. If no value can be seen in having this in the BCL.

dotnet / runtime