dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

API proposal: Array<T> and List<T> in place Sort and BinarySearch for keySelector function #19565

Closed AlexRadch closed 4 years ago

AlexRadch commented 7 years ago

Now we can sort Array<T> and List<T> by keySelector Func<TValue, TKey> by using Enumerable extension methods, but this will create in memory copy of array or list. I suggest to add API for in place Sort and BinarySearch with keySelector function Func<TValue, TKey>.

API proposal

public class List<T>
{
    public void Sort<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer = null);
    public void Sort<TKey>(int index, int count, Func<T, TKey> keySelector, IComparer<TKey> comparer = null);

    public int BinarySearch<TKey>(TKey key, Func<T, TKey> keySelector, IComparer<TKey> comparer = null);
    public int BinarySearch<TKey>(int index, int count, TKey key, Func<T, TKey> keySelector, IComparer<TKey> comparer = null);
}

The same API for Array class.

mikedn commented 7 years ago

It's not clear to me what value the new Sort overloads have. You can already do anything you want using the existing Sort(Comparison<T>) overload. The new overload is also likely to have worse performance due to the fact that a single compare requires 2 delegate calls and possibly another interface call for the actual comparison (via IComparer<TKey>).

To make things worse, the underlying sort implementation is currently built around IComparer<T>. So, short of duplicating the existing implementation, the only way to implement this overload is to make a custom comparer that wraps the key selector and the key comparer and have the existing sort implementation make yet another interface call to that custom comparer.

The only time such overloads are likely to be useful is when you happen to already have a selector method and want to reuse it. Which you can do anyway using the existing Sort(Comparison<T>) overload.

The BinarySearch overloads are more interesting because BinarySearch is rather crippled as it is:

These overloads would allow us to write:

int index = list.BinarySearch(42, item => item.Id);

However, these overloads have the same performance issues that the new Sort overloads have. Each compare requires a delegate call (keySelector) and an interface call (keyComparer). And like the Sort implementation the existing BinarySearch implementation is based on IComparer<T> so there's yet another interface call. At least the binary search implementation is much simpler and duplicating the existing implementation shouldn't be a problem.

For the record: this is derived from https://github.com/dotnet/coreclr/issues/2188 where other BinarySearch overloads have been proposed. Unfortunately those introduce ambiguity in overload resolution so they're likely not an option.

AlexRadch commented 7 years ago

It's not clear to me what value the new Sort overloads have. You can already do anything you want using the existing Sort(Comparison) overload.

Write compassion for next code

var keySelector = Person p => Tuple.Create(p.FirstName, p.LastName);

persons.Sort(keySelector);
var i = persons.BinarySearch(Tuple.Create("Alex", ""), keySelector);

BinarySearch(comparison) was rejected because it have conflict with BinarySearch(IComparable). See closed PR.

The new overload is also likely to have worse performance due to the fact that a single compare requires 2 delegate calls and possibly another interface call for the actual comparison (via IComparer).

Why are you think that selectKey will worse performance here but it does not worse performance in Linq?

I clear understand you suggestions, and I wrote PR to add BinarySearch(comparison) but as you wrote: it was rejected

Unfortunately those introduce ambiguity in overload resolution so they're likely not an option.

mikedn commented 7 years ago

Write compassion for next code

It's doable with existing overloads but it's a bit verbose:

Comparison<Person> comparison = (Person x, Person y) => ((IComparable)Tuple.Create(x.FirstName, x.LastName)).CompareTo(Tuple.Create(y.FirstName, y.LastName));
list.Sort(comparison);
list.BinarySearch(new Person { FirstName = "Alex", LastName = "" }, Comparer<Person>.Create(comparison));

It's better with C# 7:

Comparison<Person> comparison = (Person x, Person y) => (x.FirstName, x.LastName).CompareTo((y.FirstName, y.LastName));
list.Sort(comparison);
list.BinarySearch(new Person { FirstName = "Alex", LastName = "" }, Comparer<Person>.Create(comparison));

BinarySearch(comparison) was rejected because it have conflict with BinarySearch(IComparable). See closed PR.

Yes, I know that. But perhaps there are other alternatives, for example considering changing the order of parameters so the ambiguity is avoided.

Why are you think that selectKey will worse performance here but it does not worse performance in Linq?

I don't care what LINQ does, each API should stand on its own performance wise. In any case, if memory serves me well LINQ builds an array of keys and sorts that. So the key selector is called only once per element. Doing the same for Sort isn't feasible as it require allocating an array of keys and people don't expect Sort to allocate. The selector will then be called 2 N log N times instead of N like LINQ does.

AlexRadch commented 7 years ago

I'm not against to add API with Comparison, I tried but it was rejected. I do not understand why you against keySelector.

I don't care what LINQ does, each API should stand on its own performance wise. In any case, if memory serves me well LINQ builds an array of keys and sorts that. So the key selector is called only once per element. Doing the same for Sort isn't feasible as it require allocating an array of keys and people don't expect Sort to allocate. The selector will then be called 2 N log N times instead of N like LINQ does.

How many times does your comparison Comparison<Person> comparison = (Person x, Person y) => ((IComparable)Tuple.Create(x.FirstName, x.LastName)).CompareTo(Tuple.Create(y.FirstName, y.LastName)); create tuples? I think about 2 N log N times.

I thinks that tuples does not allocated on heap. It is structure. Compiler allocates them on stack like int.

mikedn commented 7 years ago

How many times does your comparison Comparison comparison = (Person x, Person y) => ((IComparable)Tuple.Create(x.FirstName, x.LastName)).CompareTo(Tuple.Create(y.FirstName, y.LastName)); create tuples? I think about 2 N log N times.

Only N log N times.

AlexRadch commented 7 years ago

Only N log N times.

Comparison creates 2 Tuples on each compare. With keySelector you create one tuple for middle element and one for each compare (QuickSort). So selector is faster.

mikedn commented 7 years ago

Comparison creates 2 Tuples on each compare. With keySelector you creates one tuple for middle element and one for each compare (QuickSort). So selector is faster.

Your version does the same thing unless you are considering a sort implementation that works like the LINQ one - call the selector for each person, store the tuples in an array and then sort the array.

Also, the C# 7 version doesn't allocate any objects, C# tuples use ValueTuple instead of Tuple.

ianhays commented 7 years ago

Since this hasn't gotten much interest since December, I'm going to close it for now. If anyone else wants to take control and restart the discussion, feel free to re-open this issue.

Personally I agree with @mikedn that these overloads increase the API footprint but don't provide much value since the existing Comparer overload for Sort is mostly adequate.

LYP951018 commented 6 years ago

Now we can sort Array and List by keySelector Func<TValue, TKey> by using Enumerable extension methods, but this will create in memory copy of array or list.

How could we? I tried searching BinarySearch in the API doc but none of the overloads contains keySelector ...

mikedn commented 6 years ago

How could we? I tried searching BinarySearch in the API doc but none of the overloads contains keySelector …

The text you quote refers to sorting but you're looking for BinarySearch functionality. As mentioned in a previous comment, such functionality does not exist.