golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.94k stars 17.53k forks source link

proposal: slices: add CollectN with preallocated size. #68261

Open earthboundkid opened 2 months ago

earthboundkid commented 2 months ago

Proposal Details

As discussed in #61900, it's a shame that slices.Collect(maps.Keys(m)) and slices.Sorted(maps.Keys(m)) don't preallocate the size of m. You can work around it by writing slices.Sorted(slices.Append(make([]T, 0, len(m)), maps.Keys(m))), but that's kind of ugly.

I propose adding func CollectN[S ~[]T, T any](seq iter.Seq[T], minsize int) S to slices that just returns Append(make([]T, 0, minsize), seq). It would help with map keys but also other container types where you know the length in advance or can at least make a good guess.

Sorting map keys more efficiently would look like keys := slices.CollectN(maps.Keys(m), len(m); slices.Sort(keys).

fzipp commented 2 months ago

The name gives me more maxsize than minsize vibes.

andig commented 2 months ago

Sorting map keys more efficiently would look like keys := slices.CollectN(maps.Keys(m), len(m); slices.Sort(keys).

I guess we'd rather change the implementation of slices.Sorted to use CollectN internally? But maybe not.

That said, the function signature is not particularly nice. I guess (sorry, no evidence) that the majority of use cases is around the functions of maps.KeysSlice() and maps.ValuesSlice(). If that's the case, then remaining cases could be covered as above by:

slices.Append(make([]K,0, len(m)), seq)

Are there other wide-spread cases that need to be covered?

earthboundkid commented 2 months ago

The name gives me more maxsize than minsize vibes.

ChatGPT suggests the name slices.CollectWithCapacity. Seems little long. CollectWithCap?

I guess we'd rather change the implementation of slices.Sorted to use CollectN internally? But maybe not.

I don't understand. Are you suggesting adding slices.SortedWithCap? It wouldn't be possible to use CollectWithCap internally since Sorted couldn't know the minimum length.

Sorted is just Collect+Sort, so I don't see the point of making a bunch of variations on it. If you need better Collecting, then use CollectWithCap, if not, use plain Sorted.

Are there other wide-spread cases that need to be covered?

I think this would be a good general purpose tool. You could use it with a hypothetical container/set type, for example, or if we ever genericize container/list and container/heap. I have a generic deque type I would have used this with if it had existed.

andig commented 2 months ago

I have a generic deque type I would have used this with if it had existed.

I still assume (sorry), that we'll have 90% maps keys/values.

Jorropo commented 2 months ago

I am rehashing arguments that have already been made, but this whole API which is more mental load on the programmer and more places to make mistake wouldn't need to exists if the iter form was an interface instead of a function value then we could extend it conditionally:

package iter

type Seq[V any] interface {
    Iter(yield func(V) bool)
}
type Seq2[K, V any] interface {
    Iter(yield func(K, V) bool)
}

// Len can be implemented by [Seq] and [Seq2] as optional features.
type Len interface {
    // Len returns how many calls to yield Iter will do if ok == true.
    // To be a correct implementation it MUST do n many calls, no more no less.
    // When ok == false the length is not known.
    Len() (n int, ok bool)
}

// Cap can be implemented by [Seq] and [Seq2] as optional features.
type Cap interface {
    // Cap returns how many calls to yield Iter could do if ok == true.
    // To be a correct implementation it MUST do at most n many calls, no more.
    // When ok == false the capacity is not known.
    Cap() (n int, ok bool)
}
package slices

import "iter"

func Collect[V any](i iter.Seq[V]) []V {
    var sizeHint int
    var hasSizeHint bool
    if l, ok := i.(iter.Len); ok {
        sizeHint, hasSizeHint = l.Len()
    }
    if !hasSizeHint {
        if l, ok := i.(iter.Cap); ok {
            sizeHint, hasSizeHint = l.Cap()
        }
    }
    if !hasSizeHint {
        sizeHint = 0
    }
    r := make([]V, 0, sizeHint)
    for v := range i {
        r = append(r, v) // if you actually submitted this code to std you probably would want hardening against buggy iterators that return wrong .Len and .Cap.
    }
    return r
}

Having it be optionally implemented would make this so much better, and more generally anything where you sometime can do some extra optimizations if you know the size ahead of time.

fzipp commented 2 months ago

if the iter form was an interface instead of a function value then

But it isn't. Let's focus on building on top of what is now.

Merovius commented 2 months ago

@fzipp @earthboundkid I would suggest the name "hint" or "sizehint" for the parameter, as it takes the same role as the size hint parameter for make with maps: It has no impact on the semantics or correctness of the program, but makes it more efficient if given correctly.

The name CollectN seems pragmatic to me.

ianlancetaylor commented 2 months ago

Like @fzipp above, to me the name CollectN suggests "collect up to N elements into a slice."

And as others have observed we can already do this using slices.AppendSeq.

I think this function might be premature at this point.

zigo101 commented 2 months ago

If an iterator must be used, consider adding a slices.CollectInto function for preallocation need. Or don't use iterator at all.

ianlancetaylor commented 2 months ago

@zigo101 That sounds like the existing slices.AppendSeq.

earthboundkid commented 2 months ago

I think it would be okay to put this proposal on hold for six months while we get used to using iterators in production.

zigo101 commented 2 months ago

@earthboundkid You can use nstd.CollectMapKeys now. :D

earthboundkid commented 2 months ago

That's sort of what I mean. We can just use our personal iteration libraries for six months or so then regroup and see if we mostly want CollectWithCap (because we need to collect a lot of other containers into slices) or mostly want maps.KeySlice or maps.KeysSorted etc. based on real world usage. It's a little theoretical for now, although I have started doing some stuff with x/net/html iterators in production.

zigo101 commented 1 month ago

It might be a bad idea to put "slices.Collect" in std.

andig commented 1 month ago

Imho it is very unfortunate that something as simple and often used as exp/maps/Keys() should become something as unobvious as

keys := slices.CollectN(maps.Keys(m), len(m))

Majority of use cases is probably better addressed adding dedicated functions for maps.Keys and slices.Values. Suggesting to revisit #61626

jub0bs commented 1 day ago

If the main concern is about efficiently iterating over sorted keys of a map (and possibly its associated values), wouldn't it be addressed by the addition of the following functions (perhaps with better names) to the maps package, functions that would use the knowledge of the number of keys in the map?

func Sorted[M ~map[K]V, K cmp.Ordered, V any](m M) iter.Seq2[K, V]

func SortedFunc[M ~map[K]V, K comparable, V any](m M, cmp func(K, K) int) iter.Seq2[K, V]

func SortedStableFunc[M ~map[K]V, K comparable, V any](m M, cmp func(K, K) int) iter.Seq2[K, V]

The need for slices.CollectN would become moot, IMO.

Edit: I see that @earthboundkid had already suggested something similar (the addition of a SortedKeys function) in https://github.com/golang/go/issues/61900#issuecomment-1917362371