golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
120.67k stars 17.33k forks source link

proposal: slices: add functions MinNFunc and MaxNFunc #67376

Open jech opened 1 month ago

jech commented 1 month ago

Proposal Details

I propose the addition of a function

func MinNFunc[S ~[]E, E any](n int, x S, cmp func(a, b E) int) S

which returns the n smallest elements of x in increasing order. If len(x) < n, then MinNFunc returns x sorted.

MinNFunc(n, x, cmp) is equivalent to

    y := SortFunc(x, cmp)
    if len(y) < n {
        return y
    }
    return y[:n]

but it avoids an allocation of size O(len(x)).

For symmetry, there could be a function MaxNFunc which returns the n largest elements of x in decreasing order. MaxNFunc(n, x, cmp) is equivalent to

MinNFunc(n, x, func(a, b E) int {
    return -cmp(a, b)
})
seankhliao commented 1 month ago

this sounds very specialized does it have wide usage elsewhere? https://go.dev/doc/faq#x_in_std

earthboundkid commented 1 month ago

I think this is the wrong proposal. Instead container/heap should be made generic, and you can just pop the first N items off a heap. See #47632.

jech commented 4 weeks ago

this sounds very specialized does it have wide usage elsewhere?

I am not aware of it being included in the stdlib of any popular language, granted. However, it is a simple, well-defined operation that I often find useful and that is not completely trivial to implement efficiently, but that can reuse a lot of the machinery in pdqsort.

As an additional data point, if you perform a web search for "k largest elements", it appears to be a fairly popular operation, but perhaps not for the right reasons (it would seem it's a standard interview question, for some reason).

Instead container/heap should be made generic, and you can just pop the first N items off a heap. See #47632.

If you mean that MinNFunc can be implemented using a priority queue (and therefore using a binary heap) in O(n log k) time, then you're absolutely right. However, doing it efficiently is not completely trivial (the naive implementation copies every element, even those that end up being discarded.)

If you mean that I should be using a priority queue instead of computing the prioritary elements on demand, then I cannot do that: the priority changes dynamically (it depends on the network peers, don't ask), and there's no obvious way to determine when the priorities have changed sufficiently to justify rebuilding the priority queue.

I agree that it'd be nice to have a generic priority queue, but that's orthogonal to this proposal.