apple / swift-algorithms

Commonly used sequence and collection algorithms for Swift
Apache License 2.0
5.92k stars 440 forks source link

Add `chunkedByReduction(into:_)`. #148

Open toddthomas opened 3 years ago

toddthomas commented 3 years ago

Description

I recently needed to separate a collection of UI objects into subsequences whose combined width did not exceed the width of a view, so they could be wrapped into rows in that view. This seemed like a job for a chunking algorithm where the predicate would be allowed to reduce the elements, so I wrote that. I'm wondering whether others think it's generally useful enough to be included here.

Here's a simple example:

let numbers = [16, 8, 8, 19, 12, 5]
let chunks = numbers.chunkedByReduction(into: 0) { sum, n in
  sum += n
  return sum <= 16
}
// [[16], [8, 8], [19], [12], [5]]

Note that a single element which fails the predicate is included in the resulting collection. More on that below.

This sort of chunking could be implemented using the Standard Library's existing reduce(into:_), but then the trailing closure would be cluttered with all the logic to keep track of and generate the subsequences. I like that this new method abstracts that away so the closure is simply the predicate that determines how to chunk elements.

Detailed Design

I've added

extension Collection {
  public func chunkedByReduction<Accumulator>(
    into initialValue: Accumulator,
    _ predicate: @escaping (inout Accumulator, Element) throws -> Bool
  ) rethrows -> [SubSequence]
}

and

extension LazyCollectionProtocol {
  public func chunkedByReduction<Accumulator>(
    into initialValue: Accumulator,
    _ predicate: @escaping (inout Accumulator, Element) -> Bool
  ) -> ChunkedByReduction<Accumulator, Self>
}

Handling single elements which fail the predicate

In my UI problem which motivated this, it so happened that no UI element could exceed the width of the view I needed to wrap them into. But it wouldn't be the case in general that a single element would be guaranteed not to fail the predicate. The current implementation includes such elements in the result, to satisfy the property shared by all the other chunking methods that the entire base collection is included in their result.

Documentation Plan

I updated Guides/Chunked.md with a description and example of the new method.

Test Plan

I added a series of tests for both the eager and lazy versions of the new method.

Source Impact

This only adds API. No existing API is changed or removed.

Checklist

timvermeulen commented 3 years ago

Thanks for your contribution, @toddthomas! This is an interesting algorithm with a very clear real-world use case. 👍 It's also quite a specific use case, though. I think it shows that there might be room for more flexible ways of chunking a collection, but I'm not sure this particular API gives us maximum flexibility.

The main thing that comes to mind is that chunkedByReduction(into:_:) only gives you the chunks you've formed, not the accumulated values that belong to those chunks. This information might be useful in certain situations, similar to how the chunked(on:) method gives you tuples that also contain the projected value of each chunk.

Another concern is the one you've already pointed out, that we'll either need to accept that single element chunks can fail the predicate and still be included, or that particular elements from the input might be left out of the final result. I think the option you've picked is the most reasonable one, but neither one seems ideal.

You mentioned that writing functionality in terms of reduce(into:_:) would be a pain. I wonder if writing it in terms of chunked(by:) might be easier, but at first glance that might be equally messy given that its closure gives you all of the inner elements twice... Let's iterate on this and see if we can come up with a more flexible chunking operation that doesn't have these drawbacks.

toddthomas commented 3 years ago

Thanks for taking a look, @timvermeulen. I'll think about whether and how this can be made more general.