golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.74k stars 17.63k forks source link

sort: make sorting easier, add Slice, SliceStable, SliceIsSorted, reflect.Swapper #16721

Closed bradfitz closed 8 years ago

bradfitz commented 8 years ago

EDIT: the proposal has changed later in this thread. See https://github.com/golang/go/issues/16721#issuecomment-241085266 and https://github.com/golang/go/issues/16721#issuecomment-241221703

I propose we make sorting slices a first class concept in Go 1.8.

I think everybody appreciates the cute sort.Interface type at this point, but the reality is that:

I think we should add to package sort at least:

package sort

// Slice sorts the provided slice using the function less.
// The sort is not stable.
// If slice is not a slice, Slice panics.
func Slice(slice interface{}, less func(i, j int) bool)

And maybe at the same time, or in the future:

// SliceSorter returns an sorter for the provided slice, suitable
// for with Reverse or Stable.
// If slice is not a slice, SliceSorter panics.
func SliceSorter(slice interface{}, less func(i, j int) bool) Interface

The assumption is that the user's less function would close over the data to be sorted.

For speed, this would not be purely reflect-based (except for one call to to get the slice length). The implementation would generate a custom swapper as needed for the type of the slice elements such that it's compatible with the GC & write barriers. I did a prototype of this which shows that it's as fast as the typical way.

I think there's plenty of evidence in the Github BigQuery dataset that we're making sort way too hard for users.

/cc @griesemer @robpike @ianlancetaylor @adg @broady

griesemer commented 8 years ago

I'd be ok with something like this. It's pragmatic, small, and probably would reduce a lot of sort-related boilerplate.

cznic commented 8 years ago

I think there's plenty of evidence in the Github BigQuery dataset that we're making sort way too hard to users.

I don't have access to the mentioned above dataset, but let me guess: The need-to-be-sorted type declarations and its associated methods declarations constitutes less than 1‰ of code in that corpus. Perhaps far less. If that is true then I don't think this proposal is justified. Also, using reflection in sorting, however small it is, feels to me like encouraging bad practices.

bradfitz commented 8 years ago

@cznic, you do have access: https://github.com/blog/2201-making-open-source-data-more-available

We'll have to disagree on the metrics at which this is subjectively worthwhile.

atdiar commented 8 years ago

No opinion on this.

Only thing I would say is that, sorting a slice is not exactly to be seen as simply sorting an array. Reason being that if a slice has multiple subslices, sorting one slice will somehow change the others (in a deep value kind of way).

Or should we decide that sorting a slice returns a completely new slice ? (and thus allocate ?)

Just a data point. (I guess you discussed a similar issue when adding the append function but, just to make sure).

mackross commented 8 years ago

As the primary force behind golang at our organization, this and no generic max/min function are the two things I find embarrassingly hard to explain. I'm not sure what it's like at Shoreline Blvd but without fail when an engineer joins our team their mouth drops through the floor at their first sort. Subjectively, I would love this in 1.8.

ianlancetaylor commented 8 years ago

I'm fine with adding something to the sort package, although of course func (i, j int) bool is not the function signature that most people expect.

bradfitz commented 8 years ago

@ianlancetaylor, the benefit of i, j int is that it lets you get at your []T's T or *T, if the values are too large to efficiently copy. And it fits with the existing Interface.Less signature. Plus it's not like we can do better, can we?

ianlancetaylor commented 8 years ago

Oh, sure, I understand why you suggested it. I'm just commenting that it will still surprise people.

jimmyfrasche commented 8 years ago

As much as this is complained about and as many slice types as I have sorted this has never bothered me, so I don't see the point. Boilerplate is never fun, but this pittance is far below the threshold of annoyance.

A go generate command to spit out Len and Swap, or even just an editor macro, seems more inline with the Tao of Go here.

ainar-g commented 8 years ago

I'm with @jimmyfrasche here. Why can't this be a go generateable tool?

natefinch commented 8 years ago

If it's basically as fast as sort.Inferface, and is just an addition to the stdlib, not a language change.... why would anyone ever say no to this? Yes, it's annoying that it has to use interface{} that we try not to encourage people to use... but what other choice do we have? No one has come up with a better suggestion so far.

Put me down for a +1.

artursapek commented 8 years ago

How would SliceSorter work? Wouldn't it have to define a new type, and return it? That's not possible in Go afaik.

pbnjay commented 8 years ago

I'd prefer usage of a []interface{} parameter (which would make the reflection unnecessary), but since that doesn't intuitively work on all slice types I know we're stuck with the proposed signature. +1 for saving the headaches anyway though.

To the go generate comments: It's still possible to use that method, this is only adding to the stdlib. But for newbies and rapid development use, go generate can be just as much of a pain...

bradfitz commented 8 years ago

How would SliceSorter work?

It would be a private type implementing sort.Interface. It would return Interface, per the example above.

riannucci commented 8 years ago

@pbnjay I don't think []interface{} would be very convenient; it's not possible to go from e.g. []*MyThing -> []interface{} without allocating a new slice and copying every element (which would be way more annoying than writing out the Less/Len/Swap).

It's a bit unfortunate that it has to be func(i, j int) bool though, because frequently you'd have MyThing.Less(other *MyThing) bool, which means that you'd end up doing:

sort.Slice(mySlice, func(i, j int) bool { return mySlice[i].Less(mySlice[j]) })

But oh well :). I'd still take that over declaring a new type+3 methods just to be able to sort stuff.

pbnjay commented 8 years ago

@riannucci yup - that's what i meant by "doesn't intuitively work on all slice types" - maybe in Go 2 we can get a generic slice-interface type!

bradfitz commented 8 years ago

@riannucci, in my experience, the Less method doesn't already exist somewhere. This is for ad-hoc sorts where you want to write the Less function inline, in context with your existing code:

   sort.Slice(people, func(i, j int) bool { return people[i].Name < people[j].Name })
freeformz commented 8 years ago

How about func Slice(slice interface{}, l int, less func(i, j int) bool), where l is the length up to which the slice is sorted. This means the common usage would just be calling Slice with len([]T), but saves the reflect. I'd wager it's possibly not worth it?

Edit: Nevermind, my bad, there will still be reflection.

bradfitz commented 8 years ago

@freeformz, you already have to do the reflect a bit anyway, so it doesn't save you anything to pass the length explicitly.

myitcv commented 8 years ago

Following on from @jimmyfrasche's comment (but my follow up may indeed be the subject of another issue/a separate go-nuts thread)

A go generate command to spit out Len and Swap, or even just an editor macro, seems more inline with the Tao of Go here.

On the go generate suggestion.

This raises the question of whether to distribute core, oft-used programs that are go generate-ers.

As things stand, every go generate-er needs to be retrieved via go get (or some other means). If instead some core go generate-ers were distributed as part of a release then:

There are of course some downsides here:

klingtnet commented 8 years ago

@mackross +1 for the missing max/min but Go also lacks an absolute value function for integers, which is also hard to explain.

zephyr commented 8 years ago

Shouldn’t there also be a stable counterpart? Maybe even with a variadic signature like

func SliceStable(slice interface{}, less ...func(i, j int) bool)

so that one could do

SliceStable(people, byName, byAge)

to sort people first by name and else (when they have the same name/are equivalent) by age?

jimmyfrasche commented 8 years ago

This only simplifies a trivial unimportant amount of code so I don't see it as more than, at best, a minor annoyance, regardless of frequency.

I'm sure Brad's implementation is a negligible hit on performance compared to the current way, but it would have to be used in function bodies to close over the slice like:

func something() {
  //...
  sort.Slice(things, func(i, j int) bool) {
     //...
   })
  //...
}

so you have to pay the cost of allocating the closure and building the generic sorter on each invocation of something, when you don't have to pay anything defining the methods yourself, other than the usual indirections interfaces involve. That's going to be negligible to the cost of sorting anything for all but the smallest of n, and, if it shows up in profiling, it's simple enough to rewrite it the old way, but this sort of code is generally discouraged by the stdlib.

If this is going for pure convenience, I'd prefer the version @ianlancetaylor hinted at where the comparator is an interface containing a func(a, b T) bool (though the runtime hit from that would certainly be less negligible, even modulo potential copying overhead, and it's even further afield of encouraged practice).

Maybe I'm just uncomfortable that this is at a weird level between convenient and practical that results in this awkward situation where you have to close over a slice that's being permuted under the covers.

All that said, if this goes in, I'll use it—I'm lazy and hypocritical!

I don't have any problem with this existing—it's the inclusion in the stdlib that gives me pause. I have absolutely zero reservations about this being a go gettable library: people could even start using it today. The excellent go4.org or even golang.org/x seem like fine places to put this, and if it gets used a lot without issues moving it to the stdlib would be fine with me.

twotwotwo commented 8 years ago

Taking a key func that's allowed to return one of a list of types ((u)int, []byte, string) makes it look a bit more like sort functions in other languages that take a key func, and would let you later drop in faster sorts specific to the key types without adding APIs. Lots of downsides make that unpalatable (keyFunc'd be an interface{} in the signature, trickier internally, overhead of Less calling keyFunc twice, and can't write a custom Less to sort by two fields or whatever). Maybe someone sees a better variation of that idea that I don't :) so throwing it out there anyhow.

bradfitz commented 8 years ago

@jimmyfrasche,

so you have to pay the cost of allocating the closure

Even today with sort.Sort you have to pay the cost of putting a slice into an interface value, since a slice (1 pointer & two ints) is not 0 or 1 pointers (which is all that goes into an interface allocation-free). So it's not significantly different.

I'd prefer to keep this issue focused on the proposal and not the implementation, though.

broady commented 8 years ago

When I read the title, I thought you were going to propose something like a built-in:

var s []t
sort(s, func(a, b t) bool {
  return a < b
})

But this is certainly more pragmatic. Saying that 1% of programs touch sort isn't a very good argument against including this. This is scoped to the sort package, so programs that aren't sorting won't care about it. Even more reason to include it.

oyi812 commented 8 years ago

I'd prefer to tolerate something more essential and less ugly bloating the standard packages.

nsf commented 8 years ago

Can we teach compiler to handle both cases:

var s []int
sort(s, func(a, b int) bool { return a < b; })
var s []GiantType
sort(s, func(a, b *GiantType) bool { return a.prio < b.prio; })

Also not sure if it's worth worrying about copy mechanics. Perhaps compiler could elide copies somehow? Inline the closure? If it's a pure function, why not. And another point is that if you worry about copying, sort does a lot of swaps anyway, so maybe change data structure.

But anyways, I don't like the idea of having indices if we can do generics in the compiler itself. Most built-in functions act as generic functions.

nsf commented 8 years ago

Sorry for semicolons, I do a lot of javascript lately ("semicolon free" language).

nsf commented 8 years ago

Another thought: recognize simple known types (strings, ints, floats) and use default comparison operator (less than):

var s []int
sort(s)

That's kind of obvious though.

jimmyfrasche commented 8 years ago

@bradfitz I mentioned that, but you're right that I should have edited that portion out. I was trying to convey that the performance hit, while negligible, would have to be paid more often by design, but I ended up hemming and hawing to the point of incoherence. I apologize. It's not a terribly important concern anyway.

Biggest issue:

Why should this be in the stdlib? You could throw it up on go4 right now and move it to the stdlib later. People who want to use it could start using it today.

twotwotwo commented 8 years ago

@jimmyfrasche Something like it has been at https://github.com/bradfitz/slice for a while.

justinfx commented 8 years ago

I really like this idea. I'm also one of those that have almost never written anything different for 2 of the 3 sort interface methods for slices. And also have found it a bit embarrassing having to explain to new Go devs how much boilerplate has to be written to sort a slice. Usually they then ask why they can't just call a sort function with a single key function, like in other languages.

I agree that because it's a localized addition to the sort package that only makes doing a thing more straightforward, that there is 100% benefit here. People can still use the existing approach or use the new shorter approach when it makes sense. Let's fix warts.

bradfitz commented 8 years ago

@twotwotwo, I don't recommend using github.com/bradfitz/slice. I never updated it to be compatible (safely) with Go 1.5's GC. But yes, essentially something like that. I should update it and move it to go4.org. Still, I believe this is common enough to warrant being in the standard library and would improve the readability of code.

@nsf, let's stay on topic. Nobody (except perhaps you) is proposing language changes. This bug is about a library change proposal only. If you want to discuss a hypothetical language change, the best place is probably golang-nuts@, since it's out of scope for Go 1.

nsf commented 8 years ago

@bradfitz How is it offtopic? I was implying that this is a bad idea and you guys should add a built-in function instead. It's a pretty good place where custom code generation is justified. And saying that a language change is out of scope for Go 1 is not serious, there were a number of language changes in 1.x releases (1.2 added three-index slices for example). Adding a built-in function won't break anyone's code.

bradfitz commented 8 years ago

@nsf, we can discuss on golang-nuts if you'd like.

atdiar commented 8 years ago

Shouldn't the values in the slice implement a comparison method? The signature looks a bit esoteric in the initial example.

crosbymichael commented 8 years ago

If you need another data point I second what @nsf said.

If you want to add real first-class support to Go for sorting, create a builtin how append works so that we don't have to close over our slice that we already passed, to the same function, and access it again by index.

This proposal is better than the current sort interface but not compelling enough to make me rewrite all of our sorters to use this new func.

cespare commented 8 years ago

I'm in favor of this proposal -- it's a focused, pragmatic improvement to a common use case.

The way that the "less" function closes over the slice and takes indexes as arguments is similar to how an existing function in sort already works: Search.

robpike commented 8 years ago

In my experience, sort doesn't show up all that often, and it's easy to use as is. So I am not fussed about it.

As far as the proposal goes, it bothers me to add another function signature that takes interface{} to the standard library. It allows people to argue that we like them, and force people to use them and create them in their own work.

I do agree that the proposed function will be easy to use, although I'd like to see the implementation and its performance before deciding whether its benefits outweigh its costs.

matttproud commented 8 years ago

@bradfitz,thank you for putting this together and thoughtfully answering our questions. If you don't mind, I have a few more to ask:

For speed, this would not be purely reflect-based (except for one call to to get the slice length). The implementation would generate a custom swapper as needed for the type of the slice elements such that it's compatible with the GC & write barriers. I did a prototype of this which shows that it's as fast as the typical way.

  • You propose mutating a user-provided interface{} function parameter that represents some slice. Unless I am mistaking, this would become the first and only place the standard library mutates a bare interface{} value. Is that correct, and does that matter? Reflecting on a data interface{} for reading/describing is well-defined in the standard library, however.

I am on the fence about the proposal for concerns of hand-wavy standard library purity but not doctrinally inflexible — FWIW. What extra burdens of proof (e.g., documented behavioral contract) would mutating data interface{} require?

bradfitz commented 8 years ago

You propose mutating a user-provided interface{} function parameter that represents some slice. Unless I am mistaking, this would become the first and only place the standard library mutates a bare interface{} value.

False: https://golang.org/pkg/encoding/json/#Unmarshal

What is the intended behavior if a user passes in a type backed by a non-slice reflect.Kind?

It's documented in the initial comment in this post.

matttproud commented 8 years ago

Thank you. json.Unmarshal was so blatantly obvious it was under my nose. :-)

mrjrieke commented 8 years ago

A beauty of the existing implementation is the sort method doesn't have any understanding or care as to what 'thing' it is sorting as long as it implements the sort Interface, so I hope you keep it :). I also think that the proposal here, will be easier to implement for many, so I hope we get it, too.

anderejd commented 8 years ago

I love that someone on the go team is giving the sorting boiler plate some attention, but I do not look forward to more code depending on the empty interface. One of the main selling points of go for me is type safety, please don't go full python.

bradfitz commented 8 years ago

@rajder, there could at least be a perfectly accurate go vet check for this. It's something.

anderejd commented 8 years ago

@bradfitz Oh, that would be nice. A build error is still better imho :)

RobbieMcKinstry commented 8 years ago

I'm apprehensive towards. Because the typical use case for the SliceSorter requires a closure, I think how to use the function will not be readily obvious to those unfamiliar with the pattern. I would be comfortable with the change if it case with a strong and simply example in the GoDoc. Were it to be without an associated example, I would not favor the change.

Go was my first language, and the language which taught me closures. I'm imagining other students coming in to the community from a Java or C mindset and being confused. I realize catering to these people is a non-goal, but most Gophers agree one of Go's strength is it's simplicity of design patterns.

I agree with all of @bradfitz's reasons for it's inclusion, however.

velovix commented 8 years ago

@RobbieMcKinstry I think closures are an important enough feature of the language that Gophers can be expected to know it. See net/http's HandleFunc for example. I do agree that this function should have an example, though. Ideally, many more things in the Go standard library would have examples.

RobbieMcKinstry commented 8 years ago

Not sure this is the place for this discussion, but would more examples for standard library code be accepted in the general case? Or is the documentation considered "complete"?