JuliaCollections / IterTools.jl

Common functional iterator patterns
Other
153 stars 28 forks source link

Differences in `Base.Iterators.partition` and `IterTools.partition` #67

Open ssfrr opened 4 years ago

ssfrr commented 4 years ago

There are some differences between Base.Iterators.partition and IterTools.partition. I brought it up on Slack and @oxinabox asked me to file an issue.

  1. Base.Iterators.partition returns each chunk as an Array, whereas IterTools.partition returns a Tuple
  2. In the case that the last chunk is smaller than the requested partition size, Base.Iterators.partition returns a shorter chunk, whereas IterTools.partition drops the last values.
  3. IterTools.partition accepts a step argument that determines the hop size for each chunk.
ssfrr commented 4 years ago

update - I think on master Base.Iterators.partition now returns views into the original array, not Arrays.

This might (?) be the last inconsistency that would need to be addressed before something like #30 could be resolved. How do folks want to resolve the differences listed above? I'd propose using the Base.Iterators behavior for (1) and (2), but keeping the step argument.

One path forward would be to add the step argument as a PR to Base, then just reexport Base.Iterators.partition. This would also take advantage of the recent improvements that @mbauman made.

omus commented 4 years ago

Change to make Base.Iterators.partition return views was done in: https://github.com/JuliaLang/julia/pull/33533

goretkin commented 4 years ago

I'm in favor of keeping the current behavior of IterTools.partition, perhaps under a different name, to avoid confusion with Base.

For example, implementing Base.diff, but for an iterable it instead of just AbstractVectors is very natural in terms of IterTools.partition(it, 2, 1), and the fact that it produces a stream of tuples, which have a length known at compile time is (I'm guessing) beneficial for efficient code generation.

ssfrr commented 4 years ago

Yeah, it's sort of a shame that the implementation needs to be different depending on whether it's designed for long or short partitions (which would use Arrays or Tuples, respectively). For instance, in my use case (streaming audio) I often want to chop the stream into chunks of 512 or 1024 elements, so Tuples aren't good for that. Maybe the partition type should be provided as an optional argument to partition?

goretkin commented 4 years ago

Oh, good. That's an important use too. Base.Iterators.partition returns views into the array, instead of copies, now. Does that fit the audio chunking need?

ssfrr commented 4 years ago

Yeah, array views are good for me

pepijndevos commented 1 year ago

Difference 2 is handled in Clojure by having partition-all and partition, where base partition maps to Clojure's partition-all behavior.

Difference 1 & 2 are grouped together in Transducers under Consecutive vs Partition. So you either get fixed length tuples or variable length arrays.

I think this roughly aligns with common usage patterns where you either want to have small fixed length groups, or process large chunks of an array.

In my ideal universe both consecutive (IterTools partition) and partition would be in base, both supporting a step argument.