tkf commented 3 years ago

Commit Message

Add SplitBy (#461)

This patch adds SplitBy iterator transformation and an underlying (ATM internal) ReduceSplitBy transducer. SplitBy can be used for processing like eachline and split in parallel and/or streaming programs.

codecov[bot] commented 3 years ago

Codecov Report

Merging #461 (aad1c7f) into master (c11379f) will increase coverage by 0.54%. The diff coverage is 96.38%.

@@            Coverage Diff             @@
##           master     #461      +/-   ##
==========================================
+ Coverage   90.86%   91.41%   +0.54%     
==========================================
  Files          31       32       +1     
  Lines        2037     2120      +83     
==========================================
+ Hits         1851     1938      +87     
+ Misses        186      182       -4

Flag	Coverage Δ
unittests	`91.41% <96.38%> (+0.54%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/Transducers.jl	`71.42% <ø> (ø)`
src/basics.jl	`95.31% <ø> (+6.25%)`	:arrow_up:
src/splitby.jl	`96.29% <96.29%> (ø)`
src/lister.jl	`47.72% <100.00%> (-38.64%)`	:arrow_down:
src/library.jl	`93.76% <0.00%> (-0.26%)`	:arrow_down:
src/processes.jl	`94.71% <0.00%> (+0.44%)`	:arrow_up:
src/progress.jl	`92.30% <0.00%> (+0.96%)`	:arrow_up:
src/core.jl	`89.37% <0.00%> (+0.96%)`	:arrow_up:
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c11379f...aad1c7f. Read the comment docs.

tkf commented 3 years ago

ID time GC time memory allocations

... ... ... ... ...

["splitby", "count", "foldl"] 3.797 ms (5%) 96 bytes (1%) 4

["splitby", "count", "man"] 1.504 ms (5%)

["splitby", "count", "reduce"] 786.290 μs (5%) 2.84 KiB (1%) 54

--- https://github.com/JuliaFolds/Transducers-data/blob/multi-thread-benchmark-results/2021/03/15/010146/result.md

ID	time	GC time	memory	allocations
...	...	...	...	...
`["splitby", "count", "foldl"]`	3.797 ms (5%)		96 bytes (1%)	4
`["splitby", "count", "man"]`	1.504 ms (5%)
`["splitby", "count", "reduce"]`	786.290 μs (5%)		2.84 KiB (1%)	54

The performance is nice, but it's strange that reduce is much better than foldl, even with no multithreading:

julia> suite = include("benchmark/multi-thread/bench_splitby.jl")
       results = run(suite; verbose = true)
(1/1) benchmarking "count"...
  (1/3) benchmarking "foldl"...
  done (took 6.10017195 seconds)
  (2/3) benchmarking "reduce"...
  done (took 5.655999351 seconds)
  (3/3) benchmarking "man"...
  done (took 5.316228889 seconds)
done (took 17.61219897 seconds)
1-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "count" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "foldl" => Trial(3.618 ms)
          "reduce" => Trial(1.788 ms)
          "man" => Trial(1.473 ms)

julia> Threads.nthreads()
1

then with more threads:

1-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "count" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "foldl" => Trial(3.611 ms)
          "reduce" => Trial(383.696 μs)
          "man" => Trial(1.440 ms)

julia> Threads.nthreads()
4

JuliaFolds / Transducers.jl

Add SplitBy #461

Commit Message

Codecov Report