apple / swift-algorithms

Commonly used sequence and collection algorithms for Swift
Apache License 2.0
5.97k stars 443 forks source link

Add `partitioned(by:)` #152

Closed mdznr closed 3 years ago

mdznr commented 3 years ago

Description

Adds a partitioned(by:) algorithm. This is very similar to filter(_:), but instead of just getting an array of the elements that do match a given predicate, also get a second array for the elements that did not match the same predicate.

This is more performant than calling filter(_:) twice on the same input with mutually-exclusive predicates since:

  1. It only requires a single pass of the elements
  2. If the input has a known number of elements, the cumulative space for both returned arrays is known and can avoid array buffer resizing.
let cast = ["Vivien", "Marlon", "Kim", "Karl"]
let (longNames , shortNames) = cast.partitioned(by: { $0.count < 5 })
print(longNames)
// Prints "["Vivien", "Marlon"]"
print(shortNames)
// Prints "["Kim", "Karl"]"

Detailed Design

extension Sequence {
  @inlinable
  public func partitioned(
    by predicate: (Element) throws -> Bool
  ) rethrows -> (falseElements: [Element], trueElements: [Element])
}

Naming

At a high-level, this acts similarly to the partition family of functions in that it separates all the elements in a given collection in two parts, those that do and do not match a given predicate. Thanks, @timvermeulen for help with naming!

Documentation Plan

Test Plan

Source Impact

This is purely additive

Checklist

mdznr commented 3 years ago

I made the closure belongsInSecondCollection, which is consistent with the other partitioned functions, but in terms of the return values, feels backwards ((second, first)).

mdznr commented 3 years ago

I ran some benchmarks using the awesome swift-collections-benchmark package, as suggested by @timvermeulen:

chart

The output does confirm that using partitioned(_:) is faster than calling filter(_:) twice.

Using the Collection-based implementation with the fixed buffer size is faster than the Sequence-base implementation. However, the function’s overhead makes it slightly slower for collections fewer than 8 elements. For that reason, we could check count < 8 in the start of the Collection implementation and conditionally run the Sequence-based approach, which is slightly faster.

I was initially surprised partitioned(_:) wasn’t significantly faster than calling filter(_:) twice, though. With more experimentation, I learned how much the overall cost depends on the cost of the closure’s evaluation. The more costly the closure, the more valuable it was to avoid calling the closure twice (obviously). In very simple cases, the cost of evaluating the closure is extremely insignificant compared to the cost of adding each element to the output collection. However, in all cases, it is still faster to use partitioned(_:) than calling filter(_:) twice.

Using a slighty more expensive closure yielded these results:

chart
Benchmarking code/details Simple closure test: ```swift benchmark.addSimple( title: "Filter × 2", input: [Int].self ) { input in blackHole(input.filter({ $0.isMultiple(of: 3) })) blackHole(input.filter({ !$0.isMultiple(of: 3) })) } benchmark.addSimple( title: "Partitioned (Sequence)", input: [Int].self ) { input in blackHole(input._partitioned({ $0.isMultiple(of: 3) })) } benchmark.addSimple( title: "Partitioned (Collection)", input: [Int].self ) { input in blackHole(input.partitioned({ $0.isMultiple(of: 3) })) } ``` More expensive closure test: ```swift let multiples: [Int] = [1, 3, 5, 7] benchmark.addSimple( title: "Filter × 2", input: [Int].self ) { input in blackHole(input.__filter({ int in multiples.allSatisfy({ int.isMultiple(of: $0) }) })) blackHole(input.__filter({ int in !multiples.allSatisfy({ int.isMultiple(of: $0) }) })) } benchmark.addSimple( title: "Partitioned (Sequence)", input: [Int].self ) { input in blackHole(input._partitioned({ int in multiples.allSatisfy({ int.isMultiple(of: $0) }) })) } benchmark.addSimple( title: "Partitioned (Collection)", input: [Int].self ) { input in blackHole(input.partitioned({ int in multiples.allSatisfy({ int.isMultiple(of: $0) }) })) } ``` _All tests run on iMac Pro 3.2 GHz 8-Core Intel Xeon W; 32 GB 2666 MHz DDR4; macOS 11.3 (20E232); Apple Swift version 5.4.2 (swiftlang-1205.0.28.2 clang-1205.0.19.57)_
mdznr commented 3 years ago

Those graphs look good! It's nice to see that the added complexity seems to be worth it for most sizes. I think it'd be useful to benchmark another version that returns a (ArraySlice, ArraySlice) pair (or even (ArraySlice, ReversedCollection<ArraySlice>)) just so we can see how much performance we're missing out on by allocating two new arrays.

chart
Closeup of the two slice variants chart
Code ```swift extension Collection { @inlinable public func partitionedA( _ belongsInSecondCollection: (Element) throws -> Bool ) rethrows -> ([Element], [Element]) { guard !self.isEmpty else { return ([], []) } // Since `RandomAccessCollection`s have known sizes (access to `count` is // constant time, O(1)), we can allocate one array of size `self.count`, // then insert items at the beginning or end of that contiguous block. This // way, we don’t have to do any dynamic array resizing. Since we insert the // right elements on the right side in reverse order, we need to reverse // them back to the original order at the end. let count = self.count // Inside of the `initializer` closure, we set what the actual mid-point is. // We will use this to partitioned the single array into two in constant time. var midPoint: Int = 0 let elements = try [Element]( unsafeUninitializedCapacity: count, initializingWith: { buffer, initializedCount in var lhs = buffer.baseAddress! var rhs = lhs + buffer.count do { for element in self { if try belongsInSecondCollection(element) { rhs -= 1 rhs.initialize(to: element) } else { lhs.initialize(to: element) lhs += 1 } } let rhsIndex = rhs - buffer.baseAddress! buffer[rhsIndex...].reverse() initializedCount = buffer.count midPoint = rhsIndex } catch { let lhsCount = lhs - buffer.baseAddress! let rhsCount = (buffer.baseAddress! + buffer.count) - rhs buffer.baseAddress!.deinitialize(count: lhsCount) rhs.deinitialize(count: rhsCount) throw error } }) let lhs = elements[.. Bool ) rethrows -> (ArraySlice, ArraySlice) { guard !self.isEmpty else { return ([], []) } // Since `RandomAccessCollection`s have known sizes (access to `count` is // constant time, O(1)), we can allocate one array of size `self.count`, // then insert items at the beginning or end of that contiguous block. This // way, we don’t have to do any dynamic array resizing. Since we insert the // right elements on the right side in reverse order, we need to reverse // them back to the original order at the end. let count = self.count // Inside of the `initializer` closure, we set what the actual mid-point is. // We will use this to partitioned the single array into two in constant time. var midPoint: Int = 0 let elements = try [Element]( unsafeUninitializedCapacity: count, initializingWith: { buffer, initializedCount in var lhs = buffer.baseAddress! var rhs = lhs + buffer.count do { for element in self { if try belongsInSecondCollection(element) { rhs -= 1 rhs.initialize(to: element) } else { lhs.initialize(to: element) lhs += 1 } } let rhsIndex = rhs - buffer.baseAddress! buffer[rhsIndex...].reverse() initializedCount = buffer.count midPoint = rhsIndex } catch { let lhsCount = lhs - buffer.baseAddress! let rhsCount = (buffer.baseAddress! + buffer.count) - rhs buffer.baseAddress!.deinitialize(count: lhsCount) rhs.deinitialize(count: rhsCount) throw error } }) let lhs = elements[.. Bool ) rethrows -> (ArraySlice, ReversedCollection>) { guard !self.isEmpty else { let emptyArraySlice = [Element]()[0...] return ( emptyArraySlice, emptyArraySlice.reversed() ) } // Since `RandomAccessCollection`s have known sizes (access to `count` is // constant time, O(1)), we can allocate one array of size `self.count`, // then insert items at the beginning or end of that contiguous block. This // way, we don’t have to do any dynamic array resizing. Since we insert the // right elements on the right side in reverse order, we need to reverse // them back to the original order at the end. let count = self.count // Inside of the `initializer` closure, we set what the actual mid-point is. // We will use this to partitioned the single array into two in constant time. var midPoint: Int = 0 let elements = try [Element]( unsafeUninitializedCapacity: count, initializingWith: { buffer, initializedCount in var lhs = buffer.baseAddress! var rhs = lhs + buffer.count do { for element in self { if try belongsInSecondCollection(element) { rhs -= 1 rhs.initialize(to: element) } else { lhs.initialize(to: element) lhs += 1 } } let rhsIndex = rhs - buffer.baseAddress! initializedCount = buffer.count midPoint = rhsIndex } catch { let lhsCount = lhs - buffer.baseAddress! let rhsCount = (buffer.baseAddress! + buffer.count) - rhs buffer.baseAddress!.deinitialize(count: lhsCount) rhs.deinitialize(count: rhsCount) throw error } }) let lhs = elements[..", input: [Int].self ) { input in blackHole(input.partitionedC({ $0.isMultiple(of: 2) })) } ```

I’m a bit surprised that the (Array, Array) implementation was actually faster in many cases (from a few hundred to a couple hundred thousand elements). Thinking it was a fluke, I’ve re-run this several times and continue to get similar results. I’m not sure yet why that could be.

I wish there were a clear best implementation from a performance point of view. However, since the performance for the non-Array return values weren’t better in all cases, I would say the tradeoff of having non-Array types as the return values aren’t worth it, as it does expose some implementation details and would make it difficult to change the implementation (and possibly the signature) later without it being a non-backwards-compatible breaking change.

timvermeulen commented 3 years ago

I’m a bit surprised that the (Array, Array) implementation was actually faster in many cases (from a few hundred to a couple hundred thousand elements). Thinking it was a fluke, I’ve re-run this several times and continue to get similar results. I’m not sure yet why that could be.

That's interesting and indeed surprising.

I wish there were a clear best implementation from a performance point of view. However, since the performance for the non-Array return values weren’t better in all cases, I would say the tradeoff of having non-Array types as the return values aren’t worth it, as it does expose some implementation details and would make it difficult to change the implementation (and possibly the signature) later without it being a non-backwards-compatible breaking change.

I completely agree with your conclusions here. I'd still be interested to see how returning (Array(lhs), Array(rhs.reversed())) rather than reversing the right side in-place could improve the (Array, Array) version even more, but judging by the tiny difference between the two slice versions it probably won't make a huge difference.

mdznr commented 3 years ago

I wish there were a clear best implementation from a performance point of view. However, since the performance for the non-Array return values weren’t better in all cases, I would say the tradeoff of having non-Array types as the return values aren’t worth it, as it does expose some implementation details and would make it difficult to change the implementation (and possibly the signature) later without it being a non-backwards-compatible breaking change.

I completely agree with your conclusions here. I'd still be interested to see how returning (Array(lhs), Array(rhs.reversed())) rather than reversing the right side in-place could improve the (Array, Array) version even more, but judging by the tiny difference between the two slice versions it probably won't make a huge difference.

I think if I’m following you correctly, that should be the same as the test I ran earlier:

When would the reversal happen? If removing line 315 here and instead reversing the ArraySlice right before its conversion to an Array, it makes it slower.

chart
timvermeulen commented 3 years ago

I think if I’m following you correctly, that should be the same as the test I ran earlier:

I missed that, my bad. Looks like ReversedCollection is just too slow to make this worthwhile.

timvermeulen commented 3 years ago

@swift-ci Please test

natecook1000 commented 3 years ago

@swift-ci Please test

mdznr commented 3 years ago

Thank you @timvermeulen, @natecook1000, @xwu, @LucianoPAlmeida, @fedeci, and @CTMacUser for helping me get this function integrated into swift-algorithms!