Open dozheiny opened 1 year ago
how common is this operation that it needs to be in the standard library? https://go.dev/doc/faq#x_in_std
this would just be reduce in #61898
It's possible to define this in term of slices.Index: https://go.dev/play/p/H9HZqXlyI2e
how common is this operation that it needs to be in the standard library? https://go.dev/doc/faq#x_in_std
this would just be reduce in #61898
I think it's widespread. In multiple base codes, I want to count a specific value in slices and write a function that counts for me.
It's possible to define this in term of slices.Index: https://go.dev/play/p/H9HZqXlyI2e
Also, It's possible to implement the CountsFunc
function. https://go.dev/play/p/NunxDFQh8pZ?v=gotip
it's simpler to just define it directly, too: https://go.dev/play/p/6kCIzk-yTgl
how does this come up in your code bases? I've certainly needed to know the count of things before but I usually use a multiset as a histogram because when I need the count of one thing I usually need the count of several others too.
it's simpler to just define it directly, too: https://go.dev/play/p/6kCIzk-yTgl
how does this come up in your code bases? I've certainly needed to know the count of things before but I usually use a multiset as a histogram because when I need the count of one thing I usually need the count of several others too.
Typically, I require the CountsFunc
in codebases. Within my codebases, I often deal with large slices containing numerous objects and fields that require counting for specific objects within that slice. This allows me to make comparisons among these objects.
Here's what I usually do: https://go.dev/play/p/PZoRXcl8qc3 that lets me build the counts up for the whole thing and then I query the map. This is better than doing repeated O(n) queries on the slice if you need to run multiple queries. If you just want to run one query the search is faster and it needs to get rebuilt if the slice changes. I'm usually just doing this for one off analysis or a one time ETL so I usually have all the slices "done" before I need to check the counts and I tend to need to check the counts of multiple items.
Here's what I usually do: https://go.dev/play/p/PZoRXcl8qc3 that lets me build the counts up for the whole thing and then I query the map. This is better than doing repeated O(n) queries on the slice if you need to run multiple queries. If you just want to run one query the search is faster and it needs to get rebuilt if the slice changes. I'm usually just doing this for one off analysis or a one time ETL so I usually have all the slices "done" before I need to check the counts and I tend to need to check the counts of multiple items.
This implementation is better than mine, It's better than returns counts of one specific value.
But what about struct slices? Typically, you would require the CountsFunc
function for struct slices. However, CountsFunc
only returns counts of a specific value.
Have a func from your value to something comparable. When you build the map use
h[key(v)] += 1
and check the map using h[key(v)]
Or use a different data structure.
But what about struct slices? Typically, you would require the
CountsFunc
function for struct slices. However,CountsFunc
only returns counts of a specific value.
As noted earlier, the Reduce
in xiter is enough, maybe a bit dense:
h := make(map[string]int)
Reduce(h, func(h map[string]int, s string) map[string]int { h[s]++; return h }, seq)
Less generally:
// could easily use `vs Seq[V]` instead
func Counts[V comparable](vs []V) map[V]int {
return CountsFunc(vs, func(v V) V { return v })
}
func CountsFunc[V any, K comparable](vs []V, key func(V) K) map[K]int {
h := make(map[K]int)
for _, v := range vs {
h[key(v)]++
}
return h
}
Counting like this is a no-brainer in e.g. a data science library or language, but strategies for optimal counting diverge ... no opinion on whether or where this fits culturally in Go.
C++ has std::count
and std::count_if
that do this operation on iterators. Java has java.utils.Collections.frequency
that does this operation on a collection. As far as I know C++ std::vector
and Java Vector
do not support this operation directly. I think for Go we should hold off on adding this to the slices package, and instead consider whether to add it to the iters package if we adopt #61898.
Python has collections.Counter which consumes an iterator to make a histogram map: https://docs.python.org/3/library/collections.html#collections.Counter
@ianlancetaylor So, will this function be added to the slices package or x/exp/xiter?
@dozheiny This is a proposal for adding new API. No decision has been made. I'm saying that I think it would be preferable to add this to x/exp/iter rather than slices. If anybody wants to make an argument that slices, or perhaps both, is better, this is the place to do it.
I think this definitely makes way more sense as a function for iterators, not slices. It's a function that inherently needs to be implemented via iteration over a slice anyways, so by implementing it for iterators it'll be exactly the same but not limited to just slices.
For what it’s worth, I feel like the simple case of just counting how many times X appears in a slice or iterator is better of just being done in a loop. I have used languages that have a lot of fancy functions to do this stuff for you, and the end result is you spend more time looking up the documentation for the iterator library than it would take to just write the loop. It creates a second way to do things that only saves a trivial amount of typing but takes up a certain amount of cognitive load.
Obviously these things are a spectrum. I’m happy to have slices.Clone even though it’s trivial because it’s shorter and more clear for a very common operation. As the operations become less common though, the value of having them in a library drops off pretty rapidly.
A quirky argument for Count
in xiter
, precisely using map[K]int
: the iteration order of built-in maps is guaranteed to shuffle. I believe Count
would be in many cases naive for performance, and using a map would often be part of the reason. But there is utility in testing something fancier against a simply and correctly written solution, and the shuffling is a great property here.
I don't get why there's a strings.Count
but no slices.Count
.
@PierreTurnbull That's a fair question. My 2¢ is that strings.Count is doing something different than slices.Count would do.
From the strings.Count docs:
Count counts the number of non-overlapping instances of substr in s.
Counting non-overlapping instances is slightly tricky. Implementing count of cond(item) in items is not tricky. strings.Count also uses an assembly language version of count for one byte needles, which is also tricky.
I think it would be better to give up on this proposal instead propose that a container/histogram package be with a new type similar to Python's collection.Counter class because efficiently doing a count of everything in a sequence is a little tricky.
We can implement slices.Count
the same way as bytes.Count
. But that should be a separate proposal, not this one.
@earthboundkid I think trivial features like counting elements should be implemented in the simplest way. The developer that uses/learns the language should not be required to know the technical details of how it works under the hood in order to understand the language. It should be possible to count elements in any kind of iterable data structure in the simplest way, following a standard convention (eg : strings.Count, slices.Count, map.Count, anyiteratoryoucanimagine.Count). Things should be simple, especially when one the main motive of the language is simplicity.
As a beginner in Go, I might be missing some intellectual keys to understanding the way things are done so please excuse me if I'm missing an obvious point. This thing and few others leave me quite confused and skeptic concerning the quality of the language and its standard libraries, especially in terms of DX. This first impression is in complete opposition with the positive opinions I heard about this language, which doubles my confusion.
@PierreTurnbull As you point out, there is an existing strings.Count
function. However, it does not do what this proposal suggests. So while I agree that we should have slices.Count
that acts like strings.Count
and bytes.Count
, this proposal is asking for something different: the ability to count the number of times a specific element occurs in a slice. We should not have a slices.Count
that is different from the existing strings.Count
and bytes.Count
.
Further, the desire to count the number of times an element occurs in a slice would, I think, be better satisfied by adding a function to the iter package or, more likely to start, the x/exp/iter package. That would correct to the C++ std::count
function. It would support, as you suggest, counting not just elements in a slice but also in a map, another container, and so on.
Iterators, and for that matter the slices package, are new to Go, and we are working through the details of how to best handle them. We move carefully because we are very strict about backward compatibility, so once we have implemented something we can never change it.
The slices package has no functions for counting a specific value for now. I think it's good the definition of function is like this:
An example of the Counts function is like this: