Closed MasonProtter closed 1 year ago
Merging #553 (c616391) into master (f8d0dfe) will increase coverage by
0.11%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #553 +/- ##
==========================================
+ Coverage 95.43% 95.54% +0.11%
==========================================
Files 32 32
Lines 2233 2268 +35
==========================================
+ Hits 2131 2167 +36
+ Misses 102 101 -1
Flag | Coverage Δ | |
---|---|---|
Pkg.test | 94.54% <100.00%> (-0.02%) |
:arrow_down: |
Run.test | 95.41% <100.00%> (+0.20%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
src/Transducers.jl | 73.33% <ø> (ø) |
|
src/core.jl | 93.15% <100.00%> (+0.09%) |
:arrow_up: |
src/dreduce.jl | 100.00% <100.00%> (ø) |
|
src/processes.jl | 94.71% <100.00%> (+0.50%) |
:arrow_up: |
src/reduce.jl | 96.61% <100.00%> (+0.18%) |
:arrow_up: |
Current State
Fundamentally, Transducers is quite good at doing reductions but collecting results into an output array is a major weakness. The way that it does this currently is essentially just doing
(or
foldxt
for the parallel version). Iff
is expensive to evaluate, then this extra overhead isn't so bad, but for functions that can be done in a CPU cycle or two, it's catastrophic:Here's how it currently looks with a very cheap function (
abs
):And here's a more expensive function (
sin
):This PR
In this PR I made a version of
collect(xf::Transducer, coll)
(and similar forcopy
) operating on transducers that checks ifxf
preserves the size ofcoll
(i.e.Map
is okay, butFilter
is not), and checks ifcoll
has a known (runtime) size. If both of those are satisfied, then we do a more optimized method that involvessetindex!!
on arrays.We can't do the
setindex!!
thing directly fortcollect
since it would cause race conditions if the output object changed, so instead fortcollect
I split thecollection
into a bunch ofchunks
whose size is determined bybasesize
(I useIterators.partition
for this currently and want to fix that before merging to useSplittablesBase.jl
).Now here's what those benchmarks look like with my new changes:
abs
:and
sin
:So that's a nice speedup, though
tcollect
is still leaving some performance on the table, it's still an improvement. This should help alleviate https://github.com/tkf/ThreadsX.jl/issues/196 and https://github.com/tkf/ThreadsX.jl/issues/196, though it still won't be as fast asThreadsX.map!
since the way we combine the results from different arrays is not as efficient as preallocating and then just assigning.