Closed cy6erGn0m closed 6 years ago
Use cases:
fun view(page: Int) = source.slidingWindow(pageSize).drop(page).firstOrNull()
- it might be better to use
elementAtOrNull(n)
instead ofdrop(n).firstOrNull()
- I wouldn't call this an effective implementation for
view(page)
, as it requires to populate and then throw away all pages before the required one. Might be better justdrop(pageSize * page).take(pageSize)
permutations generation, advanced signal processing, hex to byte array
- windows returned could have
size < windowSize
, it would break these usages.
Alternatives:
what slidingWindowIndices
is supposed t do? What is the signature?
It returns a sequence of ranges rather than sub-contents. Not sure we have to make it public, most likely we shouldn't
@cy6erGn0m so it is equivalent to this.indices.slidingWindow()
? I don't think this is worth a separate function.
actually it is equivalent to indices.slidingWindow(..).map { IntRange(it.first(), it.last()) }
that in fact is too inefficient. The only use-case for this function is the case when you don't want to copy sub contents but just want to keep index ranges for performance reasons while indices.slidingWindow(...)
is definitely not a good idea if you need performance.
Yes, now I see the benefit of this function.
What troubles me is that slidingWindowIndices
does not depend on the content of the collection.
The use case you described may be covered with slidingWindow
overloaded for ClosedRange<T>
(IntRange
) with a usage like: list.indices.slidingWindow()
Yes, but it leads to an ambiguity: IntRange
is both ClosedRange
and Iterable<Int>
so having two functions with different return types could cause confusion
TODO: Elaborate use-cases, remove unnatural ones (paging, bytesFromHex).
@ilya-g I don't agree that bytesFromHex
is unnatural: I do like this in tests (with simplified implementation of slidingWindow) as it is one of the easiest ways to do that.
Regarding byteFromHex
, I would write it as follows:
fun String.hexToBytes(): ByteArray {
require(this.length % 2 == 0) { "Must contain even number of hex digits." }
val size = this.length / 2
return ByteArray(size) { p -> (Character.digit(this[p*2],16)*16 + Character.digit(this[p*2+1], 16)).toByte() }
}
It might look not so clear as with slidingWindow
, but it's definitely more performant.
Again why it's not the use-case of this proposal — if one need to convert base16 string to a ByteArray
, he would use some byteFromHex
function rather than resort to ad-hoc slidingWindow
every time.
@ilya-g in your implementation there is too much arithmetic
Of course there are a lot of implementations available, for example, apache commons codec has one for that purpose. The main idea of the example is to show text-processing use-case. Don't get any ideas about text formats with fixed-size fields, don't you?
Don't get any ideas about text formats with fixed-size fields, don't you?
No, I don't. It must be crystal clear from the proposal without guessing.
Similar API: For Ruby there is not only Enumerable.each_slice(n)
but also Enumerable.each_cons(n)
.
Also similar API: For Clojure there is partition
in clojure.core
: for example (partition n coll)
and (partition n step pad coll)
. step
defaults to n
. https://clojuredocs.org/clojure.core/partition
For Python ActiveState has some (maybe inefficient) recipes for each_cons()
and each_slice()
by Hamish Lawson and Brian Quinlan: http://code.activestate.com/recipes/347689-ruby-arrayeach_cons-each_slice-overlapping-and-non/
For educational purposes I implemented some variants for fixed and sliding windows [1]. I know there is already a prototype implemented (refered in the KEEP document) with test suite which even uses a RingBuffer
.
partition()
implements a function for fixed and sliding windows suited for lists (not sequences) using indices/ranges.partitionRec()
and partitionTailrec()
implement such functions suited for lists using recursion and tail recursion (very inefficient). The concatenation of a lot of sequences using .plus()
(or .flatten()
) results in a StackOverflowException
.batch()
and slide()
implement such functions suited for lists and sequences (similar to the prototype implementation). Small "problem" here is that source.iterator()
introduces mutual state. Does not use RingBuffer
or LinkedList
, instead replaces the whole List
.[1] https://gist.github.com/hastebrot/9b0532d9b9317f13d93aeb79c24125b4
A sliding window for pairs with an offset of one, can be implemented ad hoc with this simple variant which uses zip()
. This however does not work with sequenceOf(...).constrainOnce()
:
fun main(args: Array<String>) {
val itemSeq = sequenceOf(1, 2, 3, 4, 5) //.constrainOnce()
val itemPairs = itemSeq.zip(itemSeq.drop(1))
val itemPairPairs = itemSeq.zip(itemSeq.drop(1)).zip(itemSeq.drop(2))
println(itemSeq.toList()) // [1, 2, 3, 4, 5]
println(itemPairs.toList()) // [(1, 2), (2, 3), (3, 4), (4, 5)]
println(itemPairPairs.toList()) // [((1, 2), 3), ((2, 3), 4), ((3, 4), 5)]
}
Regarding to alternatives section: "copy-paste it everywhere" is not a meaningful alternative. I mean we can assume such an alternative for every standard library function proposal.
slidingWindow2
is quite useful, usually it comes in form of something like pairwise
or so. I think it deserves its own proposal.
On the opposite slidingWindow3
seems to be a specialization of general slidingWindow
for which I don't see compelling use cases.
@ilya-g slidingWindow3
can be used to implement the Visvalingam-Whyatt polyline simplification algorithm [1]. In the algorithm in a list of points we use a sliding window of three consecutive points to build a triangle.
[1] https://bost.ocks.org/mike/simplify/
But it seems this doesn't sound like a compelling use case compared to pairwise
.
relates to dropTrailing
: http://stackoverflow.com/questions/7958712/inconsistent-behaviour-for-xs-slidingn-if-n-is-less-than-size
Also methods of GroupedIterator
: withPadding
and withPartial
http://www.scala-lang.org/api/2.11.7/#scala.collection.Iterator$GroupedIterator
Use case for slidingWindow(size, step=1)
: https://github.com/jedrz/cdf/blob/318668f7e2304ebb1b08b9c6db15c7c158b38f5a/src/main/scala/cdf/matcher/ngrams/NGramsEvaluator.scala
Computing trends: https://github.com/unicredit/trafalgar/blob/6ccbc19a267df09304b5946ce60bc180d7690101/src/main/scala/eu/unicredit/trafalgar/Rules.scala
I feel its worth mentioning, we work with 2-dimensonal matricies through JNA from native code, meaning we get a lot of linearized 2D things --that is to say, a 2D grid converted into a 1D array by "stacking" rows one-after-the-other in an array.
The standard strategy to convert them is
val columnCount: Int = ...
val input: Array<Double> = someNativeFunction()
val matrix: List<List<Double>> = input.groupByIndexed { elem, index -> index / columnCount }
//this assumes row-major, you can use column-major by using mod `%` instead of div `/`
note that groupByIndexed
is not part of the STD lib, and is a function we wrote --we also use sequences a lot as these matrices are often quite large.
Can we alleviate a large portion of the use case for this function with a simple groupByIndexed
operator in the standard lib?
@Groostav groupBy
returns a map, it would be strange and contradictory to other *Indexed
functions if groupByIndexed
returned a list of lists.
quite right @ilya-g , thus
fun <T, K> Iterable<T>.groupByIndexed(keySelector: (index: Int, T) -> K): Map<K, List<T>>
fun <T, K> Sequence<T>.groupByIndexed(keySelector: (index: Int, T) -> K): Map<K, List<T>>
which would also satisfy my goals, if requiring a bit of reshaping for the output value.
I've revised the proposal, now there are three sets of functions considered: chunked
to partition in non-overlapped blocks, window
to partition in possibly overlapping windows or windows with gaps, and pairwise
as a special case of window
of size 2 shifted by 1 element to partition to pairs of adjacent elements.
fun <T, K> Iterable<T>.groupByIndexed(keySelector: (index: Int, T) -> K): Map<K, List<T>>
fun <T, K> Sequence<T>.groupByIndexed(keySelector: (index: Int, T) -> K): Map<K, List<T>>
@Groostav it's an interesting idea, but it's out of scope of current proposal. You can start a new proposal instead.
s/pairwise/windowedPairwise/
?
When I read function name "pairwise" I don't know if it takes each pair one by one (chunked pairwise) or if it windows it (size of 2, step 1) or if it generates all-pairs (Cartesian product). If the name were changed to "windowedPairwise" then it would be immediately clear, I think, that this is a special case of "windowed" without even needing to read the documentation.
Thoughts?
When I first read fun chunked(
and fun windowed(
, I thought I saw chuckled
and widowed
0_o
I am not a native english speaker, but it seems to me that certain rhythm is missing from those two words, which is why they sound a little unnatural.
I also wonder: Should this include implementing forEachChunk
, forEachWindow
, and forEachPair
extension functions to iterate over the desired combinations without actually creating intermediate lists?
@mfulton26 That's how the overloads taking a lambda work.
That's right. :man_facepalming: Thank you @ilya-g.
I find these functions very useful. One question: I keep thinking that the step
parameter of the windowed
function should have a default value of 1. Is this something that has been considered?
Why pairwise
is windowed(2, 1)
, not chunked(2)
?
As @mfulton26 mentioned, I would expect a different behaviour from this function judging by its name.
Is there any evidence that one is needed more often than another?
I checked it on the internet, and it seems that "pairwise" is generally used to describe all possible combinations of elements, not subsequent pairs of elements.
The only place where "pairwise" means the same as in the proposal is the Python library, possibly because they have a combinations
function as well.
I like the @marstran suggestion to add a default value of 1
to the step
parameter of the windowed
functions. If we implement it, then pairwise
will be synonymous to windowed(2)
. That would make pairwise
unnecessary and save us a problem of coming up with a clearer name for this function.
I checked it on the internet, and it seems that "pairwise" is generally used to describe all possible combinations of elements, not subsequent pairs of elements.
@voddan, could you share your findings? It would be especially useful if you could find an analogous function in some other framework, so we could compare the naming.
If we implement it, then pairwise will be synonymous to windowed(2). That would make pairwise unnecessary and save us a problem of coming up with a clearer name for this function.
pairwise
has some performance advantages over windowed(2)
, so it cannot be replaced by the latter: it doesn't allocate a list to represent a window; also being a simplified case of windowed
it has an implementation more suitable to be inlined. Therefore pairwise
is inline, while windowed
is not.
I find these functions very useful. One question: I keep thinking that the step parameter of the windowed function should have a default value of 1. Is this something that has been considered?
@marstran Initially we had both windowed
and chunked
as a single method with step
defaulted to window size, but after splitting it in two methods it became possible to make step
default to 1
.
The pros of that should be weighed carefully, as it would double the method count occupied by windowed
.
In case if we add another default parameter to windowed
, like dropPartialWindows = true
, it would be possible also to make step
optional.
Is step = 1
so common that it is a reasonable default?
@voddan, could you share your findings?
I searched StackOverflow for "pairwise" and looked through the questions and how they were using the word. Excluding questions about Python, the majority of questions use "pairwise" to describe each-to-each relationships.
I wasn't able to find any major framework that would use a "pairwise" function except for the Python library, possibly because of the controversy. Sometimes they use it in documentation to describe a pairwise comparison (which is each-to-each).
The wikipedia says that "pairwise" generally means "occurring in pairs" or "two at a time"
, but in math the term everybody uses is pairwise comparison, which is drastically different. I believe programmers prefer to use the mathematical meaning of this word.
I looked at some other languages to see what they do with functions like these:
Scala has the sliding
function which is equivalent to the windowed
function. This function has an overload with step = 1
.
Ruby has the each_cons
function which is equivalent to windowed
with step = 1
. I couldn't find a function in Ruby where you can provide the step
parameter, but you have each_slice
which is equivalent to chunked
.
F# has the windowed
function which is equivalent to Kotlin's windowed
function with step = 1
. It doesn't take a step
parameter. By the way, F# also has the pairwise
function which is equivalent to Kotlin's pairwise
.
Looking at all these, I find step = 1
to be a reasonable default.
Alternative naming options for pairwise
:
pairWithNext
, zipWithNext
zipWithSelf(offset = 1)
as equivalent of zip(drop(offset))
, this option with adjustable offset leads to less effective implementationI've renamed pairwise
to zipWithNext
and changed the parameters of windowed
.
Now step
defaults to 1 and there is a new parameter partialWindows
, which controls whether to keep or drop incomplete windows in the end, defaults to false.
https://github.com/JetBrains/kotlin/compare/7c2dd7c210b85bcebfc7ca79850e3abf5c18df94~2...7c2dd7c210b85bcebfc7ca79850e3abf5c18df94
Nice! One small question thou: why chunked
keeps partial windows, while windowed
defaults to not having them? Isn't it a bit inconsistent?
I have no personal preference on this one.
@voddan The use case of chunked
is to process a large collection in several smaller chunks. Usually you don't want the last elements of a collection to be dropped just because they form a chunk of the smaller size than requested.
On the other hand window processing cases usually require to have a fixed size window, so it seems reasonable to default to dropping partial windows.
Will there be support for "folding" function that group sequences into windows? Like having a function that determines when a window starts and another when the window ends resulting in a lazy sequence of these grouped items?
@fab1an We considered that function before, see https://github.com/Kotlin/KEEP/blob/master/proposals/stdlib/window-sliding.md#future-advancements, but decided that it would have to generic API and would be not much in demand.
But let's imagine: if you had such function, how would you call it? Could you show some code examples with this function?
Btw, I tried to design and implement such function multiple times and every time it resulted in too complex declarations and difficult to understand parameters
The declarations have been released in Kotlin 1.2, thus I'm closing the KEEP. Do not hesitate to open a YouTrack issue for any additional suggestions.
Discussions about the Sliding window functions will be held here.