dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

Sorting sequences #14729

Open eatdrinksleepcode opened 9 years ago

eatdrinksleepcode commented 9 years ago

Summary

There are two scenarios where the existing Enumerable OrderBy methods are not ideal:

using NodaTime;
using NodaTime.TimeZones;
...
IEnumerable<DateTimeZone> tzs = ...;
var tzEqComparer = new ZoneEqualityComparer();
tzs = tzs.OrderBy(x => x, Comparer<DateTimeZone>.Create((tz1, tz2) => tzEqComparer.Equals(tz1, tz2) ? 0 : tz1.Id.CompareTo(tz2.Id));

I propose adding various overloads of Sort to address these situations. The name Sort is chosen specifically to distinguish these methods, which do not take a key selector, from the OrderBy methods, which do. If this is not considered an important distinction, the names could be changed to OrderBy without colliding with the existing methods.

API

public static class Enumerable
{
    public static IOrderedEnumerable<T> Sort<T>(this IEnumerable<T> source) where T : IComparable<T>
    public static IOrderedEnumerable<T> Sort<T>(this IEnumerable<T> source, IComparer<T> comparer)
    public static IOrderedEnumerable<T> Sort<T>(this IEnumerable<T> source, Comparison<T> comparison)
    public static IOrderedEnumerable<T> SortDescending<T>(this IEnumerable<T> source) where T : IComparable<T>
    public static IOrderedEnumerable<T> SortDescending<T>(this IEnumerable<T> source, IComparer<T> comparer)
    public static IOrderedEnumerable<T> SortDescending<T>(this IEnumerable<T> source, Comparison<T> comparison)
    public static IOrderedEnumerable<T> ThenSort<T>(this IOrderedEnumerable<T> source) where T : IComparable<T>
    public static IOrderedEnumerable<T> ThenSort<T>(this IOrderedEnumerable<T> source, IComparer<T> comparer)
    public static IOrderedEnumerable<T> ThenSort<T>(this IOrderedEnumerable<T> source, Comparison<T> comparison)
    public static IOrderedEnumerable<T> ThenSortDescending<T>(this IOrderedEnumerable<T> source) where T : IComparable<T>
    public static IOrderedEnumerable<T> ThenSortDescending<T>(this IOrderedEnumerable<T> source, IComparer<T> comparer)
    public static IOrderedEnumerable<T> ThenSortDescending<T>(this IOrderedEnumerable<T> source, Comparison<T> comparison)
}

Usage

ints.Sort()
using NodaTime;
using NodaTime.TimeZones;
...
IEnumerable<DateTimeZone> tzs = ...;
var tzEqComparer = new ZoneEqualityComparer();
tzs = tzs.Sort((tz1, tz2) => tzEqComparer.Equals(tz1, tz2) ? 0 : tz1.Id.CompareTo(tz2.Id));

Questions

  1. The generic restriction to IComparable is not necessary if the implementation will delegate to OrderBy with an identity function (although there are other potential implementations). However, even if not strictly necessary, it clearly expresses the intent of the method. Is there a downside to keeping it?
  2. Overloads that take an IComparer are included for parity with OrderBy. Between the two, I prefer the overload that takes the Comparison, since converting an IComparer into a Comparison is more straightforward than going the other way. However, I am not aware of the rationale behind using IComparer instead of Comparison in the OrderBy methods. We should consider whether the same rationale applies to Sort, or whether it might make sense to remove the IComparer overloads to reduce the API surface.
  3. Should the OrderBy methods have overloads added that take Comparison for parity? Or is there something about the usage pattern of selecting a key that makes it less likely that a comparison function will be provided inline? (This is the inverse of question 2).
  4. What about Queryable?
svick commented 9 years ago
stephentoub commented 9 years ago

I'm not sure using a custom Comparison happens often enough to be worth making the API more complicated.

Not arguing one way or the other, just pointing out that for such situations where you have a Comparison<T> (or lambda that converts to it) and need an IComparer<T>, you can use Comparer<T>.Create, e.g.

var result = strings.OrderBy(s => s, Comparer<string>.Create((s1, s2) => ...));
eatdrinksleepcode commented 9 years ago

@stephentoub Indeed, the proposal shows an example of using Comparer.Create to convert from a Comparison to an IComparer. Although useful when necessary, I find this method unnecessarily verbose in otherwise terse collection manipulation pipelines, especially because the generic argument has to be specified.

@svick That's a good point about collisions with the other Sort methods. I chose Sort because OrderBy implies that the thing you are ordering by is part of calling the method, which is not the case for these overloads. i wonder if we can find another appropriate name?

Regarding Comparison, I don't have any particular reason to believe that it would be used less often than IComparer. And as I noted in the questions, it is shorter and simpler to convert from an IComparer to a Comparison (just use the method group, e.g. myComparer.Equals) than to convert from a Comparison to an IComparer (Comparer<string>.Create(myComparison)).

VSadov commented 7 years ago

It looks like the major purpose of this proposal is to

  1. not require an identity key projection when item is its own key
  2. provide comparer lambda instead (or in addition to) of IComparer

The main question is - why do we need so many new API entries? Can we still have most of the benefits without bloating the API surface too much? What would be the minimum necessary API addition?

It seems the following API additions would be sufficient:

public static IOrderedEnumerable<T> Sort<T>(this IEnumerable<T> source) where T : IComparable<T>
public static IOrderedEnumerable<T> Sort<T>(this IEnumerable<T> source, Func<T, T, int> comparison)

I do not find "Descending" or "ThenBy" methods very useful when the same key is used and comparer lambda can be trivially reversed.

For the Comparer/IComparer use in Linq in general.

Linq generally avoids pre-Func specialized delegate types and tries to standardize on Func/Action. Also note that lambdas are used mostly to extract/project/wrap the elements of the sequence, while the algorithms are parameterized via interfaces (IComparer, IEqualityComparer, . . ).

public static IOrderedEnumerable<T> Sort<T>(this IEnumerable<T> source) where T : IComparable<T>
public static IOrderedEnumerable<T> Sort<T>(this IEnumerable<T> source, IComparer<T> comparer)

Based on the above points the new API should really use IComparer, but I see how Func<T,T,int> instead could be more useful. However, i think we should have only one or another. Having both variants would seem to be unnecessary duplication.

VSadov commented 7 years ago

The idea seems reasonable, but need to settle on appropriate shape of the additional API

svick commented 7 years ago

@VSadov Did you mean Func<T, T, int> (delegate version of IComparer<T>), instead of Func<T, T, bool> (delegate version of IEqualityComparer<T>)?

VSadov commented 7 years ago

@svick - right, Func<T, T, int> of course

JonHanna commented 7 years ago

public static IOrderedEnumerable<T> Sort<T>(this IEnumerable<T> source) where T : IComparable<T>

Call that on a variable typed as List<T> for some T and it will call List's instance method instead of this.

Even if there were no signature matches, "Sort is in-place, OrderBy produces a new object" is the explanation often given, so the names would match. So I don't think Sort is a good name.

It seems to me that these are really overloads of OrderBy and so should be named as such.

I do not find "Descending" or "ThenBy" methods very useful when the same key is used and comparer lambda can be trivially reversed.

Unless the lambda can produce int.MinValue

I'm inclined though to think they should be included in this because:

  1. I think they should be called OrderBy and adding comparable overloads to ThenBy etc. is more consistent and in fact less to learn that they all have overloads of this form than that only some do.
  2. Returning IOrderedEnumerable is a promise to play nicely with ThenBy, and if we're going to play nicely with ThenBy then we've have the work done for the rest of this.

The form with no lambda (the first in either of your two alternatives) can be implemented very efficiently by having a form of ComputeKeys that does _keys = (TKey[])(object)elements and then the rest of OrderBy will just work.

The no-lambda form can also be done for IQueryable<T> easily by returning source.OrderBy(x => x), and I think should (perhaps detecting EnumerableQuery<T> and deferring to its optimised version) and then other providers will get it for free.

The form with a Comparison<T>-like Func can be done easily with an IComparer<T> implementation that takes such a Func. And/or such a type could just be made part of the API.

svick commented 7 years ago

@JonHanna

It seems to me that these are really overloads of OrderBy and so should be named as such.

They are, except the By part does not fit for the identity sort. As said previously by @eatdrinksleepcode:

I chose Sort because OrderBy implies that the thing you are ordering by is part of calling the method, which is not the case for these overloads.

I agree with that. For example, employees.OrderBy(e => e.Name) reads "order employees by name", numbers.OrderBy() is just "order numbers", there is no "by" part. Because of that, I think Order would be a good name for that "overload".

JonHanna commented 7 years ago

I could happily live with Order, though in the SQL world the analogous SELECT id FROM table ORDER BY id means it wouldn't be that weird to SQL-familiar people while those used to the other overloads would think of it as yet another overload.

madelson commented 7 years ago

Two thoughts:

eatdrinksleepcode commented 6 years ago

So it's been a little while, but I think this idea is still worth pursuing. If I can sum up the feedback:

  1. "Sort" conflicts with List<T>.Sort (and to a lesser extent, the static Array.Sort). "OrderBy" doesn't read correctly without a key selector. Is "Order" the best alternative?
  2. Accepting both IComparer<T> and Comparison<T> creates a lot of overloads. Accepting only IComparer<T> makes the usage more verbose. Accepting only Comparison<T> is inconsistent with the rest of LINQ. I am personally still inclined to provide both sets of overloads, since I don't find the overloads confusing, and they keep consuming code clean for both kinds of usage.
  3. Including a generic constraint of IComparable<T> on some overloads disallows non-generic comparables, and is inconsistent with existing OrderBy methods. While I would prefer that the case of attempting to sort non-comparable items be handled by the type system instead of by a runtime exception, as a practical matter the constraint may have to be removed.
  4. Equivalent overloads for IQueryable can easily be supplied by delegating to existing OrderBy overloads, and so should be done to maintain IEnumerable/IQueryable parity.

Does anyone care to weigh in any further on any of these issues? I am happy to make the necessary changes to the proposal. Are there any other issues that need to be addressed?

/cc @joshfree @VSadov

madelson commented 6 years ago

One implication of adding Queryable.Order<T>(this IQueryable<T>) is that existing query providers (e. g. EF) will not support this new query operator. Queryable.Order<T>(this IQueryable<T>, IComparer<T>) would also not be supported, but since most query providers don't support custom comparers anyway that would be less surprising to consumers.

One argument for using Sort over Order would be consistency with the existing Reverse, which also is hidden by List<T>.Reverse.

eiriktsarpalis commented 4 years ago

A couple of thoughts:

eiriktsarpalis commented 3 years ago

While exposing an OrderBy overload that doesn't require any arguments is valuable for simple scenaria, I think we could certainly trim down the size of the proposed API additions:

So I was thinking something like the following:

public static class Enumerable
{
    public static IOrderedEnumerable<TSource> OrderBy<TSource>(this IEnumerable<TSource> source, IComparer<TSource> comparer);
    public static IOrderedEnumerable<TSource> OrderByDefaultComparer<TSource>(this IEnumerable<TSource> source);

    public static IOrderedEnumerable<TSource> OrderByDescending<TSource>(this IEnumerable<TSource> source, IComparer<TSource> comparer);
    public static IOrderedEnumerable<TSource> OrderByDescendingDefaultComparer<TSource>(this IEnumerable<TSource> source);
}
madelson commented 3 years ago

Another possibility for the no-arguments version could be InOrder/InDescendingOrder. This does a good job of communicating that this returns a sequence which is sorted rather than sorting the sequence (FWIW I think Enumerable.Reverse might have been better off as InReverse.

var sorted = items.InOrder();

I think this reads very naturally except when followed by ThenBy. Some of the other proposals like Sort have the same issue. That said, it should be rare to follow a default sort with ThenBy since I would guess that most default comparers break all ties.