Open leaxoy opened 2 weeks ago
Change https://go.dev/cl/585018 mentions this issue: slices: add Partition
It seems could only be two parts. Why don't we use GroupBy instead ?
how common is this operation? "some scenarios" is not strong justification
how common is this operation? "some scenarios" is not strong justification
@seankhliao
Here is a simple search in github: https://github.com/search?q=%2Ffunc%5Cs*%28P%7Cp%29artition.*bool%5C%29%2F+language%3AGo+&type=code
It may not be an exact match, but it is enough to prove that the usage scenarios of partition are rich enough.
Some specific examples:
It seems could only be two parts. Why don't we use GroupBy instead ?
@septemhill
Two main reason:
Partition
is convenient than GroupBy
in the case of two classificationsgroupby
will return map, so I'm not sure if it makes sense to introduce map as a return value in slices. Before that, slices and maps didn't interact with each other.That search result contains a lot of results irrelevant to this proposal a more accurate search may be something like https://github.com/search?q=%2Ffunc%5Cs*%28P%7Cp%29artition.*func%5C%28.*%5C%29+bool%5C%29+%5C%28%2F+language%3AGo+++NOT+is%3Aarchived+NOT+is%3Afork+&type=code
In addition, I also found that the split
function also has similar functions.
https://github.com/search?q=%2Ffunc%5Cs*%28%28P%7Cp%29artition%7C%28S%7Cs%29plit%29.*func%5C%28.*%5C%29+bool%5C%29%5Cs*%5C%28%2F+language%3AGo+++NOT+is%3Aarchived+NOT+is%3Afork+&type=code
Potential optimization: allocate a bit vector containing f(i) for every index; then allocate a single array of length n and copy the true elements into the bottom and the false elements into the top; finally, return two cap-limited slices of this array. This would eliminate reallocations in append and reduce the amortized allocation overhead from 12.5% (append's average waste) to 1 bit per element (1.5% for a word-sized element, less for larger elements).
[Update: on reflection, a 12.5% allocation improvement doesn't seem worth the extra complexity. Never mind.]
It element order is not important ([edit] and the input one is allowed to be changed), then no allocation is needed.
Potential optimization: allocate a bit vector containing f(i) for every index; then allocate a single array of length n and copy the true elements into the bottom and the false elements into the top; finally, return two cap-limited slices of this array.
@adonovan This give me an idea to partition in place and return the index of the dividing point. I have seen this practice in rust, called as PartitionInPlace
.
@go101 And if I understand correctly, what you expressed can also be considered as PartitionInPlace
I wrote a Partition()
function in an internal
package in a project a little while back. I wound up removing it more recently though, mostly because it just wasn't useful anymore. It seems to me that an allocating version would make more sense in x/iter
, while the slices
implementation should be in-place.
For the fun of it, here's a zero-allocation implementation that runs in O(n)
but does not preserve ordering.
Edit: Here's a version that does preserve ordering.
Edit 2: Version 3, now with overall easier-to-read code but annoyingly repetitive condition checks.
Edit 3: I wasn't paying attention to the elements of the true
slice. Their order is not preserved. Whoops. I think it's possible to preserve it by keeping track of the index of the first one and reversing the sub-slice, but that'll reduce the efficiency just slightly.
I wrote a
Partition()
function in aninternal
package in a project a little while back. I wound up removing it more recently though, mostly because it just wasn't useful anymore. It seems to me that an allocating version would make more sense inx/iter
, while theslices
implementation should be in-place.
In fact, I write my version of Partition
in my iter
package, but is use a nested function as Seq
, because I didn't find an efficient way to create seq without using slice.
Below is my implementations in my packgae iter
.
Here is my first version implementation:
func appendSeq[E any](s Seq[E], x E) Seq[E] {
return func(yield func(x E) bool {
for e := range s {
if !yield(e) {
return
}
}
yield(x)
})
}
func Partition[E any](s Seq[E], f func(E) bool) (Seq[E], Seq[E]) {
true:=func(yield func(E) bool) {}
false:=func(yield func(E) bool) {}
for x := range s {
if f(x) {
true = appendSeq(true, x)
}else {
false = appendSeq(false, x)
}
}
return true, false
}
And the second implementation: Partition, it iterate over Seq
twice, so it is not available for non-repeatable Seq
(like: io stream).
For the fun of it, here's a zero-allocation implementation that runs in O(n) but does not preserve ordering.
I think we should reverse the version that keep order. Leave the choice to the user.
Finally, your comments are quite valuable and I will integrate these versions of implementation into the original proposal
Will iter
become another feature which is abused in Go (other than channels)? :D
Following the above discussion, we will have two variants:
func Partition[S ~[]E, E any](s S, f func(E) bool) (S, S)
func PartitionInPlace[S ~[]E, E any](s S, f func(E) bool) (S, S)
Is there anything else that needs to be discussed? If not, I will update the PR later to reflect the latest changes.
Proposal Details
In some scenarios, we need to divide the slice into two sub-slices that match and do not match by predicate, so it is proposed to provide the
Partition
andPartitionInPlace
function.Here are implementations
Partition
: Alloc new slice, and keep order, don't modify the original slice.PartitionInPlace
: No alloc slice, also not keep order, modify original slice.Usecases
Here is a simple search in github: https://github.com/search?q=%2Ffunc%5Cs*%28%28P%7Cp%29artition%7C%28S%7Cs%29plit%29.*func%5C%28.*%5C%29+bool%5C%29%5Cs*%5C%28%2F+language%3AGo+++NOT+is%3Aarchived+NOT+is%3Afork+&type=code
It may not be an exact match, but it is enough to prove that the usage scenarios of partition are rich enough.
Also includes the following situations, which cannot be quickly retrieved by github search:
Some specific examples: