apple / swift-algorithms

Commonly used sequence and collection algorithms for Swift
Apache License 2.0
5.97k stars 443 forks source link

Add API for counting all objects in a collection #243

Open amomchilov opened 1 day ago

amomchilov commented 1 day ago

Goal

It pretty common to want to take a collection of items, and count the number of occurrences of each item. e.g.

let input = ["a", "b", "c", "b", "a"]

let desiredOutput = ["a": 2, "b": 2, "c": 1] 

Today

There's 2 relatively short ways to achieve this today:

  1. Using reduce: input.reduce(into: [:]) { $0[default: 0] += 1 }

    Reduce is really general, and isn't particularly readable, especially for beginners. The performance here is good though, allocating a single dictionary and mutating it in-place.

  2. Using group(by:): group(by: { $0 }).mapValues(\.count)

    We could use the group(by:) helper that I added to Swift Algorithms, but it allocates a ton of intermediate arrays for all the groups, when all we need is their counts.

Proposed solution

The exact name is TBD, but I'm proposing a function like:

extension Sequence where Element: Hashable {
    func tallied() -> [Element: Int] {
        return reduce(into: [:]) { $0[default: 0] += 1 }
    }
}

We could also consider taking a by: parameter, to count things by a value other than themselves. Though perhaps .lazy.map would be better. E.g. input.tallied(by: \.foo) could be expressed as input.lazy.map(\.foo).tallied()

Alternatives

A more general "collectors" API

Similar to Java collectors, which let you express transformations over streams, collecting into Arrays, Dictionaries, Counters, or anything else you might like.

This could pair well with Swift Collections, e.g. if we added a new CountedSet (a native Swift alternative to NSCountedSet. E.g. we could have:

input.grouping(by: \.foo, collectingInto: { CountedSet() })

Prior art

Language Name
Python collections.Counter
Ruby tally
Java java.util.stream.Collectors.counting()
JavaScript (Lodash) countBy

C#, Rust don't have helpers for this.

xwu commented 7 hours ago

Would this API be just a different spelling for a CountedSet initializer, or would there be meaningful differences?