JuliaFolds / Transducers.jl

Efficient transducers for Julia
https://juliafolds.github.io/Transducers.jl/dev/
MIT License
432 stars 24 forks source link

Add SplitBy #461

Closed tkf closed 3 years ago

tkf commented 3 years ago

Commit Message

Add SplitBy (#461)

This patch adds SplitBy iterator transformation and an underlying (ATM internal) ReduceSplitBy transducer. SplitBy can be used for processing like eachline and split in parallel and/or streaming programs.

codecov[bot] commented 3 years ago

Codecov Report

Merging #461 (aad1c7f) into master (c11379f) will increase coverage by 0.54%. The diff coverage is 96.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #461      +/-   ##
==========================================
+ Coverage   90.86%   91.41%   +0.54%     
==========================================
  Files          31       32       +1     
  Lines        2037     2120      +83     
==========================================
+ Hits         1851     1938      +87     
+ Misses        186      182       -4     
Flag Coverage Δ
unittests 91.41% <96.38%> (+0.54%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Transducers.jl 71.42% <ø> (ø)
src/basics.jl 95.31% <ø> (+6.25%) :arrow_up:
src/splitby.jl 96.29% <96.29%> (ø)
src/lister.jl 47.72% <100.00%> (-38.64%) :arrow_down:
src/library.jl 93.76% <0.00%> (-0.26%) :arrow_down:
src/processes.jl 94.71% <0.00%> (+0.44%) :arrow_up:
src/progress.jl 92.30% <0.00%> (+0.96%) :arrow_up:
src/core.jl 89.37% <0.00%> (+0.96%) :arrow_up:
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c11379f...aad1c7f. Read the comment docs.

tkf commented 3 years ago
ID time GC time memory allocations
... ... ... ... ...
["splitby", "count", "foldl"] 3.797 ms (5%) 96 bytes (1%) 4
["splitby", "count", "man"] 1.504 ms (5%)
["splitby", "count", "reduce"] 786.290 μs (5%) 2.84 KiB (1%) 54

--- https://github.com/JuliaFolds/Transducers-data/blob/multi-thread-benchmark-results/2021/03/15/010146/result.md


The performance is nice, but it's strange that reduce is much better than foldl, even with no multithreading:

julia> suite = include("benchmark/multi-thread/bench_splitby.jl")
       results = run(suite; verbose = true)
(1/1) benchmarking "count"...
  (1/3) benchmarking "foldl"...
  done (took 6.10017195 seconds)
  (2/3) benchmarking "reduce"...
  done (took 5.655999351 seconds)
  (3/3) benchmarking "man"...
  done (took 5.316228889 seconds)
done (took 17.61219897 seconds)
1-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "count" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "foldl" => Trial(3.618 ms)
          "reduce" => Trial(1.788 ms)
          "man" => Trial(1.473 ms)

julia> Threads.nthreads()
1

then with more threads:

1-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "count" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "foldl" => Trial(3.611 ms)
          "reduce" => Trial(383.696 μs)
          "man" => Trial(1.440 ms)

julia> Threads.nthreads()
4