Closed delamarch3 closed 2 weeks ago
There is a potential that this regresses performance slightly, however, I don't believe much effort has been expended optimising this kernel for RunArray, especially given there are no benchmarks, so this is probably fine.
Thanks @tustvold. I ran some benchmarks on this and it turns out it is quite a bit slower. I've changed it slightly so it just subtracts the difference which seems to have brought it back closer to where it was, I can open a new PR? Here's the benchmark results:
This PR:
run_ends_len = 64 time: [3.9878 µs 4.0035 µs 4.0204 µs]
change: [-1.3684% -0.4485% +0.5210%] (p = 0.34 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
run_ends_len = 512 time: [56.654 µs 57.076 µs 57.458 µs]
change: [-1.7583% -1.0124% -0.2300%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
run_ends_len = 1024 time: [151.37 µs 151.83 µs 152.34 µs]
change: [-4.5739% -3.1197% -1.6545%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe
My updates:
run_ends_len = 64 time: [2.7373 µs 2.8041 µs 2.8841 µs]
change: [-32.545% -31.650% -30.574%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) high mild
8 (8.00%) high severe
run_ends_len = 512 time: [4.5291 µs 4.5413 µs 4.5548 µs]
change: [-91.973% -91.902% -91.821%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
run_ends_len = 1024 time: [6.3687 µs 6.4206 µs 6.4869 µs]
change: [-95.802% -95.750% -95.693%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
Which issue does this PR close?
Closes #6569
Rationale for this change
It's possible for a filter predicate with less values than the target array to be passed into
filter_run_end_array
which will result in an out of bounds index on the filter values, as described in the issue.What changes are included in this PR?
filter_run_end_array
as unsafefilter_run_end_array
I thought that it would be inconsistent with the other filters to have this one return an error if the lengths do not match. Eg for an int or string array filter if there are less filter values than array values then those extra values are filtered out. I replaced the unchecked filter value with a skip/take iterator.
Are there any user-facing changes?
An error is returned if the predicate and values do not have the same lengthNo