Closed bradfitz closed 8 years ago
I'd be ok with something like this. It's pragmatic, small, and probably would reduce a lot of sort-related boilerplate.
I think there's plenty of evidence in the Github BigQuery dataset that we're making sort way too hard to users.
I don't have access to the mentioned above dataset, but let me guess: The need-to-be-sorted type declarations and its associated methods declarations constitutes less than 1‰ of code in that corpus. Perhaps far less. If that is true then I don't think this proposal is justified. Also, using reflection in sorting, however small it is, feels to me like encouraging bad practices.
@cznic, you do have access: https://github.com/blog/2201-making-open-source-data-more-available
We'll have to disagree on the metrics at which this is subjectively worthwhile.
No opinion on this.
Only thing I would say is that, sorting a slice is not exactly to be seen as simply sorting an array. Reason being that if a slice has multiple subslices, sorting one slice will somehow change the others (in a deep value kind of way).
Or should we decide that sorting a slice returns a completely new slice ? (and thus allocate ?)
Just a data point. (I guess you discussed a similar issue when adding the append function but, just to make sure).
As the primary force behind golang at our organization, this and no generic max/min function are the two things I find embarrassingly hard to explain. I'm not sure what it's like at Shoreline Blvd but without fail when an engineer joins our team their mouth drops through the floor at their first sort. Subjectively, I would love this in 1.8.
I'm fine with adding something to the sort package, although of course func (i, j int) bool
is not the function signature that most people expect.
@ianlancetaylor, the benefit of i, j int
is that it lets you get at your []T
's T
or *T
, if the values are too large to efficiently copy. And it fits with the existing Interface.Less
signature. Plus it's not like we can do better, can we?
Oh, sure, I understand why you suggested it. I'm just commenting that it will still surprise people.
As much as this is complained about and as many slice types as I have sorted this has never bothered me, so I don't see the point. Boilerplate is never fun, but this pittance is far below the threshold of annoyance.
A go generate command to spit out Len and Swap, or even just an editor macro, seems more inline with the Tao of Go here.
I'm with @jimmyfrasche here. Why can't this be a go generate
able tool?
If it's basically as fast as sort.Inferface, and is just an addition to the stdlib, not a language change.... why would anyone ever say no to this? Yes, it's annoying that it has to use interface{}
that we try not to encourage people to use... but what other choice do we have? No one has come up with a better suggestion so far.
Put me down for a +1.
How would SliceSorter
work? Wouldn't it have to define a new type, and return it? That's not possible in Go afaik.
I'd prefer usage of a []interface{}
parameter (which would make the reflection unnecessary), but since that doesn't intuitively work on all slice types I know we're stuck with the proposed signature. +1 for saving the headaches anyway though.
To the go generate
comments: It's still possible to use that method, this is only adding to the stdlib. But for newbies and rapid development use, go generate
can be just as much of a pain...
How would SliceSorter work?
It would be a private type implementing sort.Interface
. It would return Interface
, per the example above.
@pbnjay I don't think []interface{} would be very convenient; it's not possible to go from e.g. []*MyThing -> []interface{}
without allocating a new slice and copying every element (which would be way more annoying than writing out the Less/Len/Swap).
It's a bit unfortunate that it has to be func(i, j int) bool
though, because frequently you'd have MyThing.Less(other *MyThing) bool, which means that you'd end up doing:
sort.Slice(mySlice, func(i, j int) bool { return mySlice[i].Less(mySlice[j]) })
But oh well :). I'd still take that over declaring a new type+3 methods just to be able to sort stuff.
@riannucci yup - that's what i meant by "doesn't intuitively work on all slice types" - maybe in Go 2 we can get a generic slice-interface type!
@riannucci, in my experience, the Less
method doesn't already exist somewhere. This is for ad-hoc sorts where you want to write the Less
function inline, in context with your existing code:
sort.Slice(people, func(i, j int) bool { return people[i].Name < people[j].Name })
How about func Slice(slice interface{}, l int, less func(i, j int) bool)
, where l
is the length up to which the slice is sorted. This means the common usage would just be calling Slice
with len([]T)
, but saves the reflect. I'd wager it's possibly not worth it?
Edit: Nevermind, my bad, there will still be reflection.
@freeformz, you already have to do the reflect a bit anyway, so it doesn't save you anything to pass the length explicitly.
Following on from @jimmyfrasche's comment (but my follow up may indeed be the subject of another issue/a separate go-nuts thread)
A go generate command to spit out Len and Swap, or even just an editor macro, seems more inline with the Tao of Go here.
On the go generate
suggestion.
This raises the question of whether to distribute core, oft-used programs that are go generate
-ers.
As things stand, every go generate
-er needs to be retrieved via go get
(or some other means). If instead some core go generate
-ers were distributed as part of a release then:
go get
is required to do core things (like slice sorting)go generate
There are of course some downsides here:
PATH
clashes with existing executable programs, we might need rather verbose names like goGenSort
)@mackross +1 for the missing max
/min
but Go also lacks an abs
olute value function for integers, which is also hard to explain.
Shouldn’t there also be a stable counterpart? Maybe even with a variadic signature like
func SliceStable(slice interface{}, less ...func(i, j int) bool)
so that one could do
SliceStable(people, byName, byAge)
to sort people
first by name and else (when they have the same name/are equivalent) by age?
This only simplifies a trivial unimportant amount of code so I don't see it as more than, at best, a minor annoyance, regardless of frequency.
I'm sure Brad's implementation is a negligible hit on performance compared to the current way, but it would have to be used in function bodies to close over the slice like:
func something() {
//...
sort.Slice(things, func(i, j int) bool) {
//...
})
//...
}
so you have to pay the cost of allocating the closure and building the generic sorter on each invocation of something
, when you don't have to pay anything defining the methods yourself, other than the usual indirections interfaces involve. That's going to be negligible to the cost of sorting anything for all but the smallest of n, and, if it shows up in profiling, it's simple enough to rewrite it the old way, but this sort of code is generally discouraged by the stdlib.
If this is going for pure convenience, I'd prefer the version @ianlancetaylor hinted at where the comparator is an interface containing a func(a, b T) bool
(though the runtime hit from that would certainly be less negligible, even modulo potential copying overhead, and it's even further afield of encouraged practice).
Maybe I'm just uncomfortable that this is at a weird level between convenient and practical that results in this awkward situation where you have to close over a slice that's being permuted under the covers.
All that said, if this goes in, I'll use it—I'm lazy and hypocritical!
I don't have any problem with this existing—it's the inclusion in the stdlib that gives me pause. I have absolutely zero reservations about this being a go gettable library: people could even start using it today. The excellent go4.org or even golang.org/x seem like fine places to put this, and if it gets used a lot without issues moving it to the stdlib would be fine with me.
Taking a key func that's allowed to return one of a list of types ((u)int, []byte, string) makes it look a bit more like sort functions in other languages that take a key func, and would let you later drop in faster sorts specific to the key types without adding APIs. Lots of downsides make that unpalatable (keyFunc'd be an interface{}
in the signature, trickier internally, overhead of Less calling keyFunc twice, and can't write a custom Less to sort by two fields or whatever). Maybe someone sees a better variation of that idea that I don't :) so throwing it out there anyhow.
@jimmyfrasche,
so you have to pay the cost of allocating the closure
Even today with sort.Sort you have to pay the cost of putting a slice into an interface value, since a slice (1 pointer & two ints) is not 0 or 1 pointers (which is all that goes into an interface allocation-free). So it's not significantly different.
I'd prefer to keep this issue focused on the proposal and not the implementation, though.
When I read the title, I thought you were going to propose something like a built-in:
var s []t
sort(s, func(a, b t) bool {
return a < b
})
But this is certainly more pragmatic. Saying that 1% of programs touch sort isn't a very good argument against including this. This is scoped to the sort package, so programs that aren't sorting won't care about it. Even more reason to include it.
I'd prefer to tolerate something more essential and less ugly bloating the standard packages.
Can we teach compiler to handle both cases:
var s []int
sort(s, func(a, b int) bool { return a < b; })
var s []GiantType
sort(s, func(a, b *GiantType) bool { return a.prio < b.prio; })
Also not sure if it's worth worrying about copy mechanics. Perhaps compiler could elide copies somehow? Inline the closure? If it's a pure function, why not. And another point is that if you worry about copying, sort does a lot of swaps anyway, so maybe change data structure.
But anyways, I don't like the idea of having indices if we can do generics in the compiler itself. Most built-in functions act as generic functions.
Sorry for semicolons, I do a lot of javascript lately ("semicolon free" language).
Another thought: recognize simple known types (strings, ints, floats) and use default comparison operator (less than):
var s []int
sort(s)
That's kind of obvious though.
@bradfitz I mentioned that, but you're right that I should have edited that portion out. I was trying to convey that the performance hit, while negligible, would have to be paid more often by design, but I ended up hemming and hawing to the point of incoherence. I apologize. It's not a terribly important concern anyway.
Biggest issue:
Why should this be in the stdlib? You could throw it up on go4 right now and move it to the stdlib later. People who want to use it could start using it today.
@jimmyfrasche Something like it has been at https://github.com/bradfitz/slice for a while.
I really like this idea. I'm also one of those that have almost never written anything different for 2 of the 3 sort interface methods for slices. And also have found it a bit embarrassing having to explain to new Go devs how much boilerplate has to be written to sort a slice. Usually they then ask why they can't just call a sort function with a single key function, like in other languages.
I agree that because it's a localized addition to the sort package that only makes doing a thing more straightforward, that there is 100% benefit here. People can still use the existing approach or use the new shorter approach when it makes sense. Let's fix warts.
@twotwotwo, I don't recommend using github.com/bradfitz/slice. I never updated it to be compatible (safely) with Go 1.5's GC. But yes, essentially something like that. I should update it and move it to go4.org. Still, I believe this is common enough to warrant being in the standard library and would improve the readability of code.
@nsf, let's stay on topic. Nobody (except perhaps you) is proposing language changes. This bug is about a library change proposal only. If you want to discuss a hypothetical language change, the best place is probably golang-nuts@, since it's out of scope for Go 1.
@bradfitz How is it offtopic? I was implying that this is a bad idea and you guys should add a built-in function instead. It's a pretty good place where custom code generation is justified. And saying that a language change is out of scope for Go 1 is not serious, there were a number of language changes in 1.x releases (1.2 added three-index slices for example). Adding a built-in function won't break anyone's code.
@nsf, we can discuss on golang-nuts if you'd like.
Shouldn't the values in the slice implement a comparison method? The signature looks a bit esoteric in the initial example.
If you need another data point I second what @nsf said.
If you want to add real first-class support to Go for sorting, create a builtin how append
works so that we don't have to close over our slice that we already passed, to the same function, and access it again by index.
This proposal is better than the current sort interface but not compelling enough to make me rewrite all of our sorters to use this new func.
I'm in favor of this proposal -- it's a focused, pragmatic improvement to a common use case.
The way that the "less" function closes over the slice and takes indexes as arguments is similar to how an existing function in sort already works: Search.
In my experience, sort doesn't show up all that often, and it's easy to use as is. So I am not fussed about it.
As far as the proposal goes, it bothers me to add another function signature that takes interface{} to the standard library. It allows people to argue that we like them, and force people to use them and create them in their own work.
I do agree that the proposed function will be easy to use, although I'd like to see the implementation and its performance before deciding whether its benefits outweigh its costs.
@bradfitz,thank you for putting this together and thoughtfully answering our questions. If you don't mind, I have a few more to ask:
For speed, this would not be purely reflect-based (except for one call to to get the slice length). The implementation would generate a custom swapper as needed for the type of the slice elements such that it's compatible with the GC & write barriers. I did a prototype of this which shows that it's as fast as the typical way.
- You propose mutating a user-provided
interface{}
function parameter that represents some slice. Unless I am mistaking, this would become the first and only place the standard library mutates a bareinterface{}
value. Is that correct, and does that matter? Reflecting on adata interface{}
for reading/describing is well-defined in the standard library, however.
I am on the fence about the proposal for concerns of hand-wavy standard library purity but not doctrinally inflexible — FWIW. What extra burdens of proof (e.g., documented behavioral contract) would mutating data interface{}
require?
reflect.Kind
?You propose mutating a user-provided interface{} function parameter that represents some slice. Unless I am mistaking, this would become the first and only place the standard library mutates a bare interface{} value.
False: https://golang.org/pkg/encoding/json/#Unmarshal
What is the intended behavior if a user passes in a type backed by a non-slice reflect.Kind?
It's documented in the initial comment in this post.
Thank you. json.Unmarshal
was so blatantly obvious it was under my nose. :-)
A beauty of the existing implementation is the sort method doesn't have any understanding or care as to what 'thing' it is sorting as long as it implements the sort Interface, so I hope you keep it :). I also think that the proposal here, will be easier to implement for many, so I hope we get it, too.
I love that someone on the go team is giving the sorting boiler plate some attention, but I do not look forward to more code depending on the empty interface. One of the main selling points of go for me is type safety, please don't go full python.
@rajder, there could at least be a perfectly accurate go vet
check for this. It's something.
@bradfitz Oh, that would be nice. A build error is still better imho :)
I'm apprehensive towards. Because the typical use case for the SliceSorter
requires a closure, I think how to use the function will not be readily obvious to those unfamiliar with the pattern. I would be comfortable with the change if it case with a strong and simply example in the GoDoc. Were it to be without an associated example, I would not favor the change.
Go was my first language, and the language which taught me closures. I'm imagining other students coming in to the community from a Java or C mindset and being confused. I realize catering to these people is a non-goal, but most Gophers agree one of Go's strength is it's simplicity of design patterns.
I agree with all of @bradfitz's reasons for it's inclusion, however.
@RobbieMcKinstry I think closures are an important enough feature of the language that Gophers can be expected to know it. See net/http's HandleFunc for example. I do agree that this function should have an example, though. Ideally, many more things in the Go standard library would have examples.
Not sure this is the place for this discussion, but would more examples for standard library code be accepted in the general case? Or is the documentation considered "complete"?
EDIT: the proposal has changed later in this thread. See https://github.com/golang/go/issues/16721#issuecomment-241085266 and https://github.com/golang/go/issues/16721#issuecomment-241221703
I propose we make sorting slices a first class concept in Go 1.8.
I think everybody appreciates the cute
sort.Interface
type at this point, but the reality is that:Less
function inline with my code, in context.I think we should add to package sort at least:
And maybe at the same time, or in the future:
The assumption is that the user's
less
function would close over the data to be sorted.For speed, this would not be purely reflect-based (except for one call to to get the slice length). The implementation would generate a custom swapper as needed for the type of the slice elements such that it's compatible with the GC & write barriers. I did a prototype of this which shows that it's as fast as the typical way.
I think there's plenty of evidence in the Github BigQuery dataset that we're making sort way too hard for users.
/cc @griesemer @robpike @ianlancetaylor @adg @broady