apple / swift-algorithms

Commonly used sequence and collection algorithms for Swift
Apache License 2.0
5.9k stars 435 forks source link

`CombinationsSequence` should be a `Collection` #219

Open dabrahams opened 7 months ago

dabrahams commented 7 months ago

It's multipass (like almost every other Sequence in practice). Not being a collection limits its applicability in silly ways.

natecook1000 commented 7 months ago

@dabrahams Could you give some examples of the limitations that you're running into here?

IIRC, the reason that CombinationSequence is only a sequence is that its iterator stores and mutates a single copy of the base collection's indices as it iterates. By contrast, a collection would need to provide an index type that either stores a separate copy of the indices (potentially very heavyweight) or stores just a position, and then recalculates the indices when used (which should be doable, but might be costly). While the Collection protocol doesn't explicitly rule out such a heavy index type, it's very different from the way most collections work, and having O(N) work happening in each subscript would probably be unexpected for users.

That said, CombinationSequence might already be falling down on the efficiency job. It looks like the iterator currently copies each combination into a new array, instead of using a collection wrapper to combine the base collection and the array of indices. Such a wrapper would be independently useful, as well (and indeed I thought we already included it):

let numbers = [10, 20, 30, 40, 50]
for x in numbers.indexed(by: [4, 2, 0]) {
    print(x)
}
// 50
// 30
// 10
dabrahams commented 7 months ago

Could you give some examples of the limitations that you're running into here?

I didn't run into a limitation, but I know such limitations exist. It's the whole reason I wrote Zip2Collection: there are algorithms that are defined only on Collection because it is multipass.

IIRC, the reason that CombinationSequence is only a sequence is that its iterator stores and mutates a single copy of the base collection's indices as it iterates.

That's no reason it can't be a collection, as you yourself know well and this implementation demonstrates.

having O(N) work happening in each subscript would probably be unexpected for users.

I don't know what your N is in this case; it's certainly not the length of the resulting collection, which is what people usually mean when they say O(N).

The work of dereferencing this thing is proportional to the data footprint of the elements you get, which is no worse than someStrings.lazy.map($0 + ".").

"It's surprisingly expensive" is subjective and not a reason for non-conformance to a protocol. The implementation either meets the efficiency bounds (and we even play loosey-goosey with that criterion in places) or it does not.