golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.88k stars 17.65k forks source link

cmp: new package with Ordered, Compare, Less; min, max as builtins #59488

Closed ianlancetaylor closed 1 year ago

ianlancetaylor commented 1 year ago

Update, May 17 2023: Current proposal at https://github.com/golang/go/issues/59488#issuecomment-1548505279.


(Edited to change the package from sort to cmp).

I propose adding a new package cmp, which will initially define the following:

// Ordered is a constraint that permits any ordered type: any type
// that supports the operators < <= >= >.
// If future releases of Go add new ordered types,
// this constraint will be modified to include them.
type Ordered interface { ... }

// Min returns the smallest of a and b. If a and b are equal, Min returns a.
func Min[T Ordered](a, b T) T

// Max returns the largest of a and b., If a and b are equal, Max returns a.
func Max[T Ordered](a, b T) T

The Min and Max functions are trivial but are widely used. The standard library contains 14 versions of Min and 7 versions of Max (not counting math.Min and math.Max). With such wide usage it's worth providing versions in the standard library.

The Ordered constraint is useful for Min and Max and also for sorting functions such as those in the x/exp/slices package. This would replace constraints.Ordered in the x/exp/constraints package. GitHub code search reports that constraints.Ordered appears in 2300 files in GitHub, so people do find it useful.

If we adopt this proposal, we would not add a constraints package to the standard library.

Ordered, Min, and Max are related to comparing values. Therefore, a package cmp is a reasonable location for them. We could also add them to the existing package sort.

Possible other additions to a cmp package would be names for the values returned by bytes.Compare and similar functions, such as cmp.Less (-1), cmp.Eq (0), and cmp.Greater (1). We could also add a function wrappers that reverses the sense of a less-than comparator, and one that reverses the sense of a byte.Compare style comparator.

We should define how Min and Max treat NaN values when instantiated with a floating-point type. I propose that if the first argument is a NaN then it will be returned, otherwise if the second argument is a NaN then it will be returned, otherwise compare with <. When comparing a negative zero and a positive zero, I propose that the first argument will be returned, as with equal values; that is both Min(0.0, math.Copysign(0.0, -1)) and Min(math.Copysign(0.0, -1), 0.0) will return 0.

Other languages

C++ puts std::min and std::max in <algorithm>. Go's standard library doesn't have anything comparable. The C++ <algorithm> header is a bit of a catch-all, including functions that Go has in sort and container/heap, and functions that Go might put into a possible iter package in the future.

Java has Math.min and Math.max, which are overloaded functions that take double, float, int, or long arguments. Go already has math.Min and math.Max, but since Go doesn't have overloading they only take float64 arguments. Most functions in Go's match package are most naturally defined for floating-point values, so even generic versions of them might not take all ordered values. And math.Ordered doesn't seem right.

Python has min and max builtin functions. There's no particular reasons to make these functions predeclared in Go, since we can write them directly in Go.

Rust has std::cmp::min and std::cmp::max. std::cmp also has the Rust equivalents of comparable and Ordered, and some support for comparator functions. This offers some support for using a cmp package in Go. On the other hand Rust's standard library has slice/vector sorting, and has various sorted container types, but as far as I know (which isn't far) doesn't have a general purpose sorting algorithm like the one in Go's sort package.

apparentlymart commented 1 year ago

These seem like good utilities to have in the standard library, and I think sort is an intuitive place to put them.

Did you consider making Min and Max be variadic instead, accepting an arbitrary number of arguments of the same type?

func Min[T Ordered](vals ...T) T
func Max[T Ordered](vals ...T) T

It would of course be possible to implement the variadic versions in terms of the two-argument versions in my own code if the standard library didn't offer a variadic version, so this is perhaps superflous, but since the variadic version could still be used with two arguments it doesn't seem like it loses any functionality to make it variadic.

(Perhaps there is a performance cost compared to just regular arguments? Is it significant?)

apparentlymart commented 1 year ago

It sounds like Ordered represents a partial order rather than a total order. Part of me wants that explicit in the name, but at the same time it doesn't seem super likely that there would ever be a "total order" interface alongside this -- at least not in package sort -- so probably just fine to leave it with a short name and just mention the expectation in the docs (as the example above does, albeit by reference to specific operators rather than using the term "partial order").

randall77 commented 1 year ago

When T is a float, we will need to describe behavior in all the special cases (NaN, +/- 0).

AndrewHarrisSPU commented 1 year ago

When T is a float, we will need to describe behavior in all the special cases (NaN, +/- 0).

Is there any case for the sort ordering constraint to just not accept floats? The math.Min/Max functions have defined behavior that I'd assume sort would want to match anyways.

randall77 commented 1 year ago

Is there any case for the sort ordering constraint to just not accept floats?

Floats are certainly tricky. But most users probably don't hit and/or care about those weird cases. So from an ergonomic API view, allowing floats makes sense. It just complicates the implementation a bit, and requires a bit of extra doc for those who care.

robpike commented 1 year ago

I lean towards putting these in a separate cmp package, for several reasons. For starters, min and max are comparators, not really sorters. As you said, too, sort is already expansive, while the details of ordering can be neatly separated, implying they should be. And since floats are messy, with no total order, having their ordering delineated in a separate package provides a location on which focus on the difficulty, isolating the problem. It might even be there are variants of Ordered to deal with this.

Also,cmp.Min reads better than sort.Min; at least I think it does.

There is no harm in putting these things in a separate package, and the possibility of appreciable good accruing.

apparentlymart commented 1 year ago

I suppose this question about floats boils down to specifying how Min and Max will treat a situation where they are asked to compare a and b where a is neither less than or equal b nor greater than nor equal b.

It will presumably then need to make an essentially-arbitrary decision, but that decision ought to be subject to Go's compatibility promises. Is there a typical answer to this that users might expect from other languages, or is this literally a case of "just pick a behavior and document it"?

ianlancetaylor commented 1 year ago

@apparentlymart In the general case variadic functions will be more expensive, because they require constructing a slice. Of course in general the functions will likely be inlined and the cost will disappear. But, especially since almost all existing examples I found take just two arguments, I think it's preferable to have the two argument version. We can consider adding variadic versions separately. I'm open to counter-argument.

Ordered is a constraint that describes the set of types that support the < operator and friends. This is exactly the set of types that the language spec describes as "ordered". So I think that Ordered is the right name even though arguably for floating-point types we only describe a partial ordering.

@randall77 I think the suggested documentation is clear about the behavior for ±0. I suppose we may have to document what happens for NaN. Sigh. I think my preference would be to say that if either value is a NaN, we will return either a or b but which one we return is unspecified. People who need more clarity can use math.Min.

ianlancetaylor commented 1 year ago

@robpike I am fine with using a new cmp package if the preference leans that way. Anybody who prefer cmp to sort, please thumbs-up @robpike 's note. Thanks.

apparentlymart commented 1 year ago

I don't feel strongly about them being variadic; I was considering it only because it seemed like a more general form that could therefore be more broadly useful without losing the convenience of the two-argument case.

However, the fact that the other examples you found are not variadic does indeed seem to suggest that it's not necessary, and combining with the possible performance cost does seem to make the two-argument form seem like the better choice. A variadic version would be pretty simple to implement elsewhere when needed.

Merovius commented 1 year ago

@randall77

It just complicates the implementation a bit, and requires a bit of extra doc for those who care.

As far as I know, the only way to implement this is via reflect (normal type-assertions/type-switches won't cover things like type F float64), which seems excessive. I'd assume most people would prefer a well-performing if a < b { return a } return b over something that might use reflect just to compare two values - even if it isn't correct in all cases.

If we had something like #45380 it would undoubtedly be simple, but that's still on ice.

Personally, I neither like the idea of adding a constraint that excludes floats, nor do I like the idea of adding an imprecise implementation for floats to the stdlib, nor do I like the idea of using reflect to add a precise version. So, unless we'd use some compiler/runtime shenanigans (which I'd resent a bit as well, because IMO user code should be able to solve this kind of problem as well) I'd be in favor of putting this off until we get something like #45380. For 3rd party code, the decision of using the simple-if-not-strictly-correct code is less problematic than for the standard library.

fgm commented 1 year ago

Extrapolating a bit, if a builtin Ordered constraint also accepted types with some Compare method (name to be bikeshedded), could that be a path to supporting those same comparison operators on them ? I know this is not the original goal, but it would fit nicely along.

Merovius commented 1 year ago

@fgm The standard library is converging on exporting two APIs to support "native" and "custom" operators. e.g. having slices.Equal and slices.EqualFunc. By that approach, we'd have to have Max and MaxFunc (etc). Which seems overkill to me.

For what it's worth, in my opinion there just is no unambiguously good way to support both "native" and "custom" operators. There are alternatives to the "two APIs" approach, but they have their own problems.

leaxoy commented 1 year ago

One question, is it possible make Ordered an ordinary interface? Ordinary interface implemented by any user defined type would be more useful and general.

And, as rob mentioned, cmp package is more reasonable.

randall77 commented 1 year ago

@merovius I think we can make the implementation fast and general. For example:

func Min[T Ordered](a, b T) T {
    if a != a || b != b { ... some special code ... }
    ... normal min code ...
}

The first line there will completely compile away for non-float types. For floats, the first line will capture any NaN values, which we can then handle at our leisure. It would require the normal min code to handle +/-inf and +/-0 correctly, but it probably would.

fzipp commented 1 year ago

I'd rather like to see the cmp package name for something like https://github.com/golang/go/issues/45200, which is more about equality for testing. I don't know if Ordered/Min/Max would fit into such a package.

seebs commented 1 year ago

So, I have conflicting desires:

  1. I want a min function that takes arbitrary numbers of things.
  2. I don't want to pay the slice overhead when the arbitrary number is 2.
neild commented 1 year ago

slices.Min?

earthboundkid commented 1 year ago

Note that a new cmp package was previously proposed here: https://github.com/golang/go/issues/58559#issuecomment-1444117979

earthboundkid commented 1 year ago

I lean towards having cmp.TotalOrdered which excludes floats and having a comment telling users to use math.Min/Max/Compare if they need floats. If someone really needs a generic sort that includes fast float missorting they can write it themselves trivially.

gophun commented 1 year ago

@fzipp Yes, github.com/google/go-cmp/cmp is a popular package in the Go ecosystem and it is recommended by Google's Go style guide, but if you need both in a test file you can always use an import alias:

import (
    "cmp"
    testcmp "github.com/google/go-cmp/cmp"
)

Proposal #45200 for "testing/cmp" could be changed to "testing/equal" or "testing/diff" or something else. It would require reflection, so it should be kept separate from the "cmp" package that is discussed here.

jimmyfrasche commented 1 year ago

variadic Min/Max has some weird corners.

You want to keep the property that Min(x, Min(y, z)) = Min(x, y, Min(z)) = Min(x, y, z, Min[T]()) where T is the type of x, y, z so Min[T]() needs to return the greatest element of T. Likewise, Max[T]() must return the least element of T.

That probably seems academic but consider this innocuous line of code y < Min(xs...) when len(xs) = 0.

This has two consequences. There needs to be some way for these to get the greatest/least element of an arbitrary T which currently cannot be done without reflection and there is no greatest string so these can only be used for numbers.

You could have them panic when they receive an empty slice or make the signature something like Min(x T, xs ...T) T so that it cannot receive 0 values but then you have to do Min(xs[0], xs[1:]...) neither of which are especially ergonomic.

Binary Min/Max is simpler. Per @neild's suggestion, a slices.Min/slices.Max could also be added later once the language has the ability to statically inspect the type argument in some fashion.

AndrewHarrisSPU commented 1 year ago

@fzipp

I'd rather like to see the cmp package name for something like #45200, which is more about equality for testing. I don't know if Ordered/Min/Max would fit into such a package.

There is some implicit conventional usage of cmp as an identifier for variables and functions in the standard library, for a function of two args returning -1/0/1, or the result of such a function. cmp is used as a parameter name for this style of function in sort.Find, slices.BinarySearchFunc, and slices.CompareFunc. Another abbrv. would be ord.

edit: From the example of slices.BinarySearchFunc, ord.Cmp[E,T] func(E,T) int seems like a fit as a component of algorithms that really need a cmp; BinarySearchFunc seems like evidence that these algos exist and should be written in this style in Go. An ord.Min or ord.Max employ the builtin definition of < ... I'm not sure there's much use for constraints like ord.Total, ord.Partial, ord.Numeric but they at least read naturally to me.

earthboundkid commented 1 year ago

I like ord, but it should be ord.Compare (not Cmp) to match strings.Compare and math.Compare.

zigo101 commented 1 year ago

If Go will supported some ordered types which are hard to be described in type constraints, then it would better to add a builtin ordered type IMO.

ianlancetaylor commented 1 year ago

@go101 Yes, if Go ever supports ordered struct or array types, then it will be necessary to do something like define ordered as a predeclared constraint, similar to comparable. But there is no particular reason to think that that will happen. If it does happen, we can always write type Ordered ordered and existing programs will continue to work.

ianlancetaylor commented 1 year ago

I'm not personally excited about a package name "ord", but I could see order.Ordered, order.Min, order.Max. I do tend to think that "sort" or "cmp" would be preferable, though. Unfortunate that there is already a popular package named "cmp" (of course it's not impossible to have two packages named "cmp", it's just unfortunate).

DeedleFake commented 1 year ago

Sounds good to me, but is there any way to avoid the stutter of order.Ordered? Maybe order.Interface, order.Constraint, or even order.Contract or order.I or something?

ericlagergren commented 1 year ago

Sounds good to me, but is there any way to avoid the stutter of order.Ordered?

cmp.Ordered :)

happygiraffe commented 1 year ago

While I'd prefer to avoid having two common packages with the name cmp, given that the current one is intended for tests, and the one described here is more likely to be used in non-test code, it would seem to be reasonable to introduce an alias in the rare case they're both used in the same file.

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

earthboundkid commented 1 year ago

58559 is very similar to this issue. It should probably be closed and merged or vice versa.

ianlancetaylor commented 1 year ago

Thanks, but I see #58559 as somewhat different. I see that one as mainly about cmp.Lt. I see this one as mainly about sort.Ordered and sort.Min. There is overlap certainly but a different focus.

griesemer commented 1 year ago

I suspect that at least cmp.Min and cmp.Max, once available, will be rather pervasive in code. We may also want to consider just adding them (min and max) to the built-in functions, together with ordered.

We do have precedent: append and copy, which now could be written in generic form (but for their special string argument handling), are present as built-ins because a) we didn't have generics at the time, but also b) because they are so widely used (append anyway).

A built-in approach would also permit a more flexible implementation of min and max (for floats; or possibly accepting more than 2 arguments while permitting very efficient code w/o resorting to a loop in a generic implementation).

I am aware that this may not be a popular opinion, but I thought it should at least be documented.

earthboundkid commented 1 year ago

I guess there are a couple of interlocking decisions to be made:

It seems like that proposal should be put on hold then pending the resolution of this one plus maybe a cycle or two for experience with whatever is chosen. The other ones are too strongly linked to pull apart.

gophun commented 1 year ago

Conditioned on it going into a new package, should that package also have operators like LT, LTE etc.? https://github.com/golang/go/issues/58559

@carlmjohnson You mention this a second time, but a decision on that still not a necessary precondition for this proposal.

Edit: Sorry, I misunderstood. You meant to put that one on hold, not this proposal here.

robpike commented 1 year ago

Ironically now that generics are in, I find myself favoring putting min and max into the language as builtins. It makes them easier to use and sidesteps most of the issues these issues (ha) are bringing up.

Merovius commented 1 year ago

@griesemer Before I would add predeclared min/max functions and an ordered constraint, I'd rather go back to the drawing board and reconsider the way we express constraints on type parameters to begin with.

When that was discussed at the time, I suggested to split the design up into two stages: First, to only allow constraining on method sets and then, after gathering experience with that, consider adding a way to use operators in a second stage. At the time, the response was (I'm paraphrasing, as I don't think I'll find the actual comment) "if we can't write a generic Min/Max function that works with predeclared types with our design, we'd get laughed out of the room". So, to me, adding these specific functions as builtins is essentially an admittance that the generics design failed what was at the time perceived a necessary condition for them.

But also, I don't actually see the benefit, to be honest. It seems to me that the question we are struggling with is one of semantics? A predeclared min would still have to define how it acts on floats and I don't see (anymore) why a library Min could not do the same thing. I'm not sure what you mean by a builtin being able to be more efficient than a loop, but a) I don't think that benefit justifies adding predeclared identifiers, personally and b) couldn't that also be addressed with better optimization?

I really don't like the idea of adding builtins for this. I feel that would go back on several design principles Go kept to in the past.

Merovius commented 1 year ago

FWIW my (personal) opinion on what we should do is: Define a library Ordered constraint and include ~float64 | ~float32 with it. Document that it will do the wrong thing in the presence of NaN and -0. Let the programmer use *Func variants if they want to do something else instead.

Yes, it invites people doing the wrong thing accidentally. But I never viewed Go as a language which tried to address every possible way to do a thing badly by adding more typeing.

(As an aside: I find the name "partial order", established as it may be, confusing when applied to floats. To me, a partial order is a reflexive, transitive, anti-symmetric relationship and reserved for things like "ordering vectors element-wise" or "directed graph reachability". i.e. it expresses that some elements might be incomparable, not that it might be unreflexive. OTOH < on floats seems to be a "strict partial order", which is a confusing term in its own rights. But maybe that's fair enough. Anyways, stepping off my soap box)

zigo101 commented 1 year ago

If min/max are not built-in, then I don't have any interest or motivation to use them, since it is easier to write min/max myself.

earthboundkid commented 1 year ago

I don't see min/max/ordered as rising to the level of usefulness needed for a built in. They're useful sure, but the main reason to use them is for sorting, and there's no proposal (I hope) to make sort a built in.

doggedOwl commented 1 year ago

I'm a little confused why this proposal got derailed by the Special handling of floats. I would assume that just matching what math.Min/math.Max does now would be the correct way to handle it. If it doesn't match it people will be confused. Yes this means that the implementation will need to specialcase the handling of floats but this does not take anything out from the utility of having these generic functions and in fact is why it is important to have these in the standard library as opposed to a 3rd party one; handling the corner cases in a consistent way.

willfaught commented 1 year ago

I don't see min/max/ordered as rising to the level of usefulness needed for a built in

I think at least having a standard definition of "orderable" is valuable, like comparable. It can just be defined as compatible with the built-in operations like <, which are already well-defined for floats.

But I also think Max and Min are useful in general, and debating the utility of such functions in the stdlib is...I don't know, stingy? It's a common operation, like uppercasing strings. I think not having them would also get us "laughed out of the room."

Merovius commented 1 year ago

@willfaught The comment you are quoting is about builtin functions/constraints (i.e. predeclared identifiers). Not about standard library declarations. I think the heuristic is that lower-case is predeclared and upper-case is library.

willfaught commented 1 year ago

I know. My point was that it would have the same judgment/reaction.

I think the heuristic is that lower-case is predeclared and upper-case is library.

I'm not sure what you're referring to? I was referring to strings.ToLower/Upper in Go, String.toUpperCase in JS, etc. It's a common stdlib thing to have because it's a common operation.

Merovius commented 1 year ago

@willfaught

I know. My point was that it would have the same judgment/reaction.

To clarify: You mean if the canonical way to refer to the minimum/maximum or ordered constraint was via a standard library import instead of a predeclared identifier, people would ridicule Go? I strongly disagree with that.

Otherwise, I don't understand your comments. Because, again, the argument was against making them predeclared identifiers. Not against adding them to the standard library. I don't think doing the latter is very controversial.

willfaught commented 1 year ago

Ah, I think I see the problem: I was interpreting "built in" in "I don't see min/max/ordered as rising to the level of usefulness needed for a built in" as meaning the stdlib, not predeclared identifiers. My mistake. I agree with you @Merovius that they shouldn't be predeclared identifiers.

griesemer commented 1 year ago

@Merovius You make good points, and I don't really have a strong technical argument against your position. If anything, I just expressed a personal preference.

Personally, I'm not looking forward to seeing a lot of cmp.Min(x, y) when I could see min(x, y) which feels just so much more natural. Of course this is also true for, say math.Sin(x) vs sin(x) (if it were available), but there's a huge difference in frequency of use and complexity of operation. Putting min and max into built-ins feels like a pragmatic choice to me, no matter the fact that we can write them just fine as generic library functions.

willfaught commented 1 year ago

I assume this would be a no-go, but a prelude-like stdlib package like in Haskell would be a way to enable having Max and Min available by default without being qualified, while allowing them to be normal declarations too.

apparentlymart commented 1 year ago

As far as I can tell, the original motivation for considering these functions as being predeclared was because that would allow their behavior to be something that wouldn't be possible to implement in today's Go, such as generating specialized implementations for each type rather than relying exclusively on the < (and similar) operators.

I didn't understand that as being primarily motivated by it being possible to use them without importing a package, even though that is another characteristic of a predeclared identifier. That's an implication of that decision, not a goal, I think?

For the sake of this proposal (which is about new library functionality) I think better to focus on how it might be implemented in the library and whether the new capability is worth its maintenance and teaching costs. If the conclusion is that it isn't possible to achieve the desired functionality in just the library today then a different proposal issue could discuss either a language change to make the library function feasible or, indeed, "just" making these be builtins.