golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.73k stars 17.5k forks source link

slices: add sorting and comparison functions #60091

Closed ianlancetaylor closed 1 year ago

ianlancetaylor commented 1 year ago

Update, May 18 2023: The current proposed API is https://github.com/golang/go/issues/60091#issuecomment-1553850782


In #57433 we added the slices package to the standard library. In the discussion of that proposal, we decided to delay adding the sorting and comparison functions that exist in golang.org/x/exp/slices, pending a decision on the constraints package.

In #59488 we proposed adding a new cmp package, which will define cmp.Ordered as a constraint matching all ordered types.

This proposal is to add the sorting functions to the slices package in the standard library, assuming that #59488 is adopted. If #59488 is declined, then this proposal should be declined.

The proposed new API (already in golang.org/x/exp/slices) is:

// Compare compares the elements of s1 and s2.
// The elements are compared sequentially, starting at index 0,
// until one element is not equal to the other.
// The result of comparing the first non-matching elements is returned.
// If both slices are equal until one of them ends, the shorter slice is
// considered less than the longer one.
// The result is 0 if s1 == s2, -1 if s1 < s2, and +1 if s1 > s2.
// Comparisons involving floating point NaNs are ignored.
func Compare[E cmp.Ordered](s1, s2 []E) int

// CompareFunc is like Compare but uses a comparison function
// on each pair of elements. The elements are compared in increasing
// index order, and the comparisons stop after the first time cmp
// returns non-zero.
// The result is the first non-zero result of cmp; if cmp always
// returns 0 the result is 0 if len(s1) == len(s2), -1 if len(s1) < len(s2),
// and +1 if len(s1) > len(s2).
func CompareFunc[E1, E2 any](s1 []E1, s2 []E2, cmp func(E1, E2) int) int

// Sort sorts a slice of any ordered type in ascending order.
// Sort may fail to sort correctly when sorting slices of floating-point
// numbers containing Not-a-number (NaN) values.
// Use slices.SortFunc with an appropriate comparison function if the input may contain NaNs.
func Sort[E cmp.Ordered](x []E)

// SortFunc sorts the slice x in ascending order as determined by the less function.
// This sort is not guaranteed to be stable.
//
// SortFunc requires that less is a strict weak ordering.
// See https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings.
func SortFunc[E any](x []E, less func(a, b E) bool)

// SortStableFunc sorts the slice x while keeping the original order of equal
// elements, using less to compare elements.
func SortStableFunc[E any](x []E, less func(a, b E) bool)

// IsSorted reports whether x is sorted in ascending order.
func IsSorted[E cmp.Ordered](x []E) bool

// IsSortedFunc reports whether x is sorted in ascending order, with less as the
// comparison function.
func IsSortedFunc[E any](x []E, less func(a, b E) bool) bool

// BinarySearch searches for target in a sorted slice and returns the position
// where target is found, or the position where target would appear in the
// sort order; it also returns a bool saying whether the target is really found
// in the slice. The slice must be sorted in increasing order.
func BinarySearch[E cmp.Ordered](x []E, target E) (int, bool)

// BinarySearchFunc works like BinarySearch, but uses a custom comparison
// function. The slice must be sorted in increasing order, where "increasing"
// is defined by cmp. cmp should return 0 if the slice element matches
// the target, a negative number if the slice element precedes the target,
// or a positive number if the slice element follows the target.
// cmp must implement the same ordering as the slice, such that if
// cmp(a, t) < 0 and cmp(b, t) >= 0, then a must precede b in the slice.
func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)
ianlancetaylor commented 1 year ago

CC @eliben

rsc commented 1 year ago

These are the ones from x/exp/slices, which we've discussed at length before. The main blocker is how to spell 'cmp.Ordered'. For this discussion let's assume 'cmp.Ordered' is the spelling and not focus on that.

Are there any concerns about the APIs otherwise?

willfaught commented 1 year ago

Sorry if this was explained elsewhere, but why isn't there a SortStable?

eliben commented 1 year ago

@willfaught see the paper trail from https://github.com/golang/go/issues/47619#issuecomment-1018743413

zephyrtronium commented 1 year ago

Perhaps nitpicking, but the comment on Sort regarding NaNs seems a bit prescriptive considering that we now have math.Compare and math.Compare32.

ianlancetaylor commented 1 year ago

@zephyrtronium Just to be clear, are you suggesting this:

// Use slices.SortFunc(x, func(a, b float64) bool {return math.Compare(a, b) < 0 })
zephyrtronium commented 1 year ago

More precisely, I am pointing out that the comment as written only says to use a < b || (math.IsNaN(a) && !math.IsNaN(b)), when math.Compare(a, b) < 0 is another way that exists. Each has different semantics and different advantages, but only one way is mentioned. Again, I think I am nitpicking, but it seems preferable to me to leave that sentence out of the documentation and leave the preferred approach(es) to examples.

ianlancetaylor commented 1 year ago

Thanks. Updated the initial comment.

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

willfaught commented 1 year ago

see the paper trail from https://github.com/golang/go/issues/47619#issuecomment-1018743413

@eliben This seems to be the relevant comment:

One caveat: we chose to exclude SortStable based on @dsnet 's comments; the version with constraints.Ordered is indistinguishable from Sort because two elements that sort equal are otherwise indistinguishable for practical purposes.

However, it doesn't link to the other comment it references. Can you summarize it? This seems to be saying that the Sort func already does a stable sort. If that's the case, then don't we need an unstable sort func as well?

eliben commented 1 year ago

However, it doesn't link to the other comment it references. Can you summarize it? This seems to be saying that the Sort func already does a stable sort. If that's the case, then don't we need an unstable sort func as well?

It refers to this comment: https://github.com/golang/go/issues/47619#issuecomment-925943596 (GitHub's auto-hiding comments in long threads interferes with the bread crumbing). It has to do with what values Ordered actually covers and the logical equivalence of stable and unstable sort for such values. See https://pkg.go.dev/golang.org/x/exp/constraints#Ordered for all the types - what does "stable" sorting mean for such values?

Merovius commented 1 year ago

I once again would like to advocate to use a single type of comparison function - i.e. to either make SortFunc accept a cmp function, or to make BinarySearchFunc accept a less function. Personally, I think the former is better, as implementing less in terms of cmp is cheaper than vice-versa.

Having two flavors of comparison function means you have to implement both per type, or you have to spell out a wrapper every time you need to go from one to the other, or you have to have a generic function that can transform one into the other (and the latter two probably add function call overhead). ISTM that ~any use of BinarySearchFunc has to rely on at least one use of Sort, so the use cases seem fairly coupled.

eliben commented 1 year ago

I once again would like to advocate to use a single type of comparison function - i.e. to either make SortFunc accept a cmp function, or to make BinarySearchFunc accept a less function. Personally, I think the former is better, as implementing less in terms of cmp is cheaper than vice-versa.

FWIW this dual use is carried over from the sort package (https://pkg.go.dev/sort), which has cmp for Find (the spiritual ancestor of BinarySearchFunc) and Less for Sort (https://pkg.go.dev/sort#Interface).

So it seems like a bit of a conundrum; having just cmp for slices would make it more internally consistent, but on the other hand would also make the contract subtly differ from the sort package for sorting, creating another kind of inconsistency.

Merovius commented 1 year ago

@eliben IMO adding a new package, with generics no less, frees us from this historical burden, if we choose so.

Merovius commented 1 year ago

Also, FWIW, golang.org/x/exp/slices.BinarySearchFunc explicitly moved from less to cmp, pretty much at the same time sort.Find was added. So that still wouldn't explain why two different flavors where used to begin with, if consistency was the goal ([edit] the proposal answers that question [/edit]).

rsc commented 1 year ago

I don't believe we should only provide the func versions. It happens frequently that you want to sort or search a slice by the usual < order provided by the language. Providing that directly without having to say ", cmp.Less" is more convenient and can be made far more efficient. This also follows other APIs in the language that provide a reasonable default behavior but then also have the extensible Func version, like strings.Index vs strings.IndexFunc.

rsc commented 1 year ago

Along these same lines, after the discussion of min and max on #59488, we should probably provide slices.Min and slices.Max here, alongside slices.Sort, since Min and Max are the other natural operations that would use Ordered. The specific signatures would be:

func Min[E cmp.Ordered](x []E) E
func Max[E cmp.Ordered](x []E) E
func MinFunc[E any](x []T, less func(E, E) bool) E
func MaxFunc[E any](x []T, less func(E, E) bool) E

If these are invoked with an empty list, they panic("slices.Min: min of empty list") and so on.

jimmyfrasche commented 1 year ago

What do they return when len(x) == 0?

rsc commented 1 year ago

Updated: they panic.

magical commented 1 year ago

If we are going to add slices.Min/Max then i would suggest also including variants that return the index of the element instead of the element itself. (These are sometimes called argmax/argmin.)

// {Min,Max}Index{,Func} returns the index of the leftmost smallest/largest element of [slice],
// according to the < operator or less function.
// Panics if [slice] is empty.
func MinIndex[E cmp.Ordered](slice []E) int
func MaxIndex[E cmp.Ordered](slice []E) int
func MinIndexFunc[E any](slice []E, less func(E, E) bool) int
func MaxIndexFunc[E any](slice []E, less func(E, E) bool) int

Alternatively, maybe Min/Max could return both the element and its index?

eliben commented 1 year ago

Along these same lines, after the discussion of min and max on #59488, we should probably provide slices.Min and slices.Max here, alongside slices.Sort, since Min and Max are the other natural operations that would use Ordered. The specific signatures would be:

func Min[E cmp.Ordered](x []E) E
func Max[E cmp.Ordered](x []E) E
func MinFunc[E any](x []T, less func(E, E) bool) E
func MaxFunc[E any](x []T, less func(E, E) bool) E

If these are invoked with an empty list, they panic("slices.Min: min of empty list") and so on.

I'll look into adding these to exp/slices as a prototype for now

mateusz834 commented 1 year ago

@rsc Small typo there, should be:

func MinFunc[E any](x []E, less func(E, E) bool) E
func MaxFunc[E any](x []E, less func(E, E) bool) E
gopherbot commented 1 year ago

Change https://go.dev/cl/495195 mentions this issue: slices: add Min/Max/MinFunc/MaxFunc

eliben commented 1 year ago

If we are going to add slices.Min/Max then i would suggest also including variants that return the index of the element instead of the element itself. (These are sometimes called argmax/argmin.)

// {Min,Max}Index{,Func} returns the index of the leftmost smallest/largest element of [slice],
// according to the < operator or less function.
// Panics if [slice] is empty.
func MinIndex[E cmp.Ordered](slice []E) int
func MaxIndex[E cmp.Ordered](slice []E) int
func MinIndexFunc[E any](slice []E, less func(E, E) bool) int
func MaxIndexFunc[E any](slice []E, less func(E, E) bool) int

Alternatively, maybe Min/Max could return both the element and its index?

IMHO Min and Max should return a single result, this makes them more usable in mathematical expressions. Moreover, I suspect that in the vast majority of cases users are interested in the min/max itself, not the index -- keeping only that will make these functions faster.

I don't object to adding Argmin/Argmax or *Index, but this can be done separately and maybe later as we assess demand.

zephyrtronium commented 1 year ago

Personally, I'd prefer argmin and argmax over min and max themselves:

It isn't hard to imagine that I'm in the minority camp, though. 🌝

rsc commented 1 year ago

For concreteness, I believe this is the API being proposed:

package slices

func Compare[E cmp.Ordered](s1, s2 []E) int
func Sort[E cmp.Ordered](x []E)
func IsSorted[E cmp.Ordered](x []E) bool
func BinarySearch[E cmp.Ordered](x []E, target E) (int, bool)
func Min[E cmp.Ordered](x []E) E
func Max[E cmp.Ordered](x []E) E

func CompareFunc[E1, E2 any](s1 []E1, s2 []E2, cmp func(E1, E2) int) int
func SortFunc[E any](x []E, less func(a, b E) bool)
func SortStableFunc[E any](x []E, less func(a, b E) bool)
func IsSortedFunc[E any](x []E, less func(a, b E) bool) bool
func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)
func MinFunc[E any](x []T, less func(E, E) bool) E
func MaxFunc[E any](x []T, less func(E, E) bool) E

There was a typo in the result of CompareFunc (it had no result), which I've fixed here and above.

eliben commented 1 year ago

@rsc there's a discussion in https://go-review.googlesource.com/c/exp/+/495195 about handling of NaNs in the newly added Min and Max functions.

Currently the Sort function in this package doesn't have any special NaN handling (since it's generic and written once for all Ordered types), so NaNs in a slice essentially lead to undefined sorting order (depending on their position, etc.) - with a comment:

// Sort may fail to sort correctly when sorting slices of floating-point
// numbers containing Not-a-number (NaN) values.
// Use slices.SortFunc(x, func(a, b float64) bool {return a < b || (math.IsNaN(a) && !math.IsNaN(b))}

A straightforward implementation of Min and Max will have the same issue, and may need a similar comment warning.

This could also be a topic for the proposal review committee to consider.

rsc commented 1 year ago

59488 defines cmp.Compare and cmp.Less that will match the current operator used by sort. That way package slices can use cmp.Compare and cmp.Less, and it will match sort.

rsc commented 1 year ago

I should add that Max and Min should not use cmp.Compare and cmp.Less. They should do NaNs in, NaNs out.

Merovius commented 1 year ago

@rsc I think my comment above about less vs. cmp has not been addressed yet?

eliben commented 1 year ago

I should add that Max and Min should not use cmp.Compare and cmp.Less. They should do NaNs in, NaNs out.

@rsc can you clarify what you mean by "NaNs in, NaNs out" in the context of Min/Max? Should a NaN always be returned if one or more elements in the slice are NaN?

rsc commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc commented 1 year ago

@eliben, standard library slices.Min and slices.Max should use the new builtins min and max and not worry about this. :-)

For x/slices, I think the closest we can probably come in efficient generic code is:

func min(x, y float64) float64 {
    if y != y || y < x {
        return y
    }
    return x
}

func max(x, y float64) float64 {
    if y != y || y > x {
        return y
    }
    return x
}

This handles everything except -0 vs +0. (https://go.dev/play/p/QflOjiyc65Z)

For more see https://github.com/golang/go/issues/59488#issuecomment-1553493207.

Thanks!

eliben commented 1 year ago

Thanks @rsc - I've updated the CL for x/exp/slices to follow this suggestion. I agree that distinguishing -0 vs. +0 in generic code will be tricky (math does it by checking Signbit).

rsc commented 1 year ago

Replying to @Merovius's suggestion about cmp vs less, yes, let's switch to cmp consistently throughout. The updated API, then, is:

package slices

func Compare[E cmp.Ordered](s1, s2 []E) int
func Sort[E cmp.Ordered](x []E)
func IsSorted[E cmp.Ordered](x []E) bool
func BinarySearch[E cmp.Ordered](x []E, target E) (int, bool)
func Min[E cmp.Ordered](x []E) E
func Max[E cmp.Ordered](x []E) E

func CompareFunc[E1, E2 any](s1 []E1, s2 []E2, cmp func(E1, E2) int) int
func SortFunc[E any](x []E, cmp func(E, E) int)
func SortStableFunc[E any](x []E, cmp func(E, E) int)
func IsSortedFunc[E any](x []E, cmp func(E, E) int) bool
func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)
func MinFunc[E any](x []E, cmp func(E, E) int) E
func MaxFunc[E any](x []E, cmp func(E, E) int) E

(Only the second half has changed.)

The argument in favor of cmp is consistency within package slices. Now if you are sorting a slice and then doing a binary search over it, you only need to write one comparison function and use it in both calls, instead of having to write two different ones.

The primary argument in favor of less is consistency with package sort. This argument doesn't hold up, for two reasons. First, if you are using a slice, you can do everything within package slices; you'll never import sort. Second, and more importantly, the less function signature used by sort is different from the one we were using here. Package sort's less takes indices, while package slices's less takes elements. So you can't use the same function with both packages, and knowing how to write a function for one package does not immediately mean you know how to write one for package slices. So this argument doesn't hold up.

A secondary argument in favor of less is that less functions tend to be shorter than cmp functions. That's true but being able to use the same function with all these different public APIs certainly beats that. If you were really attached to writing less functions you could write a generic converter. (Even so I wouldn't put one in the standard library since it would encourage inefficient code.)

The argument in favor of cmp is much more compelling than either of these, so consider it changed.

dsnet commented 1 year ago

A secondary argument in favor of less is that less functions tend to be shorter than cmp functions.

This argument isn't all that compelling anymore, either. The main complexity of implementing cmp functions is the switch needed to produce ternary results. But with the addition of cmp.Compare, that becomes a single call to a generic helper function. More complex data structures will then be implemented in terms of cmp.Compare or slices.Compare, with complexity approximately equal to the less variant.

gopherbot commented 1 year ago

Change https://go.dev/cl/496078 mentions this issue: slices: add sorting and comparison functions

rsc commented 1 year ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

eliben commented 1 year ago

Implemented in https://go.dev/cl/496078

gopherbot commented 1 year ago

Change https://go.dev/cl/498175 mentions this issue: doc: add release notes for additions to the slices package

gopherbot commented 1 year ago

Change https://go.dev/cl/505796 mentions this issue: slices: clarify MinFunc/MaxFunc result for equal elements