JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.63k stars 5.48k forks source link

30x slower looping over reinterpret array #51658

Open Moelf opened 1 year ago

Moelf commented 1 year ago

previous saga:

julia> function g(res) @simd for i in eachindex(res) res[i] = _from_zigzag(res[i]) end end g (generic function with 1 method)

julia> ARY = reinterpret(Int16, rand(UInt8, 10^5));

julia> using BenchmarkTools

julia> @benchmark g(x) setup=begin x = copy(ARY) end BenchmarkTools.Trial: 10000 samples with 10 evaluations. Range (min … max): 1.281 μs … 3.130 μs ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.290 μs ┊ GC (median): 0.00% Time (mean ± σ): 1.304 μs ± 64.783 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

▅█▇▆▅▄▃▂▂▁▁ ▂▂▁ ▂ ██████████████▇▆▆▅▅▅▁▁▁▃▃▁▃▁▁▁▁▁▅█████▇▆▁▅▄▄▃▅▃▄▄▄▄▃▃▄▄▃▄▅ █ 1.28 μs Histogram: log(frequency) by time 1.52 μs <

Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark g(x) setup=begin x = deepcopy(ARY) end BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 322.640 μs … 481.445 μs ┊ GC (min … max): 0.00% … 0.00% Time (median): 324.080 μs ┊ GC (median): 0.00% Time (mean ± σ): 325.580 μs ± 6.105 μs ┊ GC (mean ± σ): 0.00% ± 0.00%

▂██▆▆▅▃▂▂ ▂ ███████████▇▇▇▆▆▆▆▆▆▅▆▆▅▅▆▆▆▆▇▇▅▆▅▅▆▄▅▅▄▅▅▅▄▃▄▅▄▄▅▂▄▄▅▅▅▄▃▄▅▅ █ 323 μs Histogram: log(frequency) by time 358 μs <

Memory estimate: 0 bytes, allocs estimate: 0.


that's a 300x slowdown?

notice, it's faster to first copy...
```julia
julia> @benchmark g(copy(x)) setup=begin x = ARY end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  29.527 μs … 444.092 μs  ┊ GC (min … max): 0.00% … 84.11%
 Time  (median):     32.452 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   33.225 μs ±  10.909 μs  ┊ GC (mean ± σ):  0.90% ±  2.60%

         ▄█▅▅▆▁
  ▂▃▅▆▄▅▇██████▅▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  29.5 μs         Histogram: frequency by time         48.3 μs <

 Memory estimate: 97.73 KiB, allocs estimate: 2.
jishnub commented 1 year ago

Perhaps https://github.com/JuliaLang/julia/pull/44186 might help?

Moelf commented 1 year ago

looks like it helps by almos 10x

julia> @benchmark g(x) setup=begin x = copy(ARY) end
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.279 μs …   4.555 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.294 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.367 μs ± 164.712 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅▂      ▇▄                                                 ▁
  ████▆▄▃▄▄███▇▇▇▇▇▆▆▅▄▅▄▅▆▄▅▆▅▃▅▄▅▄▅▅▅▅▄▄▃▅▄▃▅▃▅▄▄▅▅▅▄▃▄▄▃▄▅ █
  1.28 μs      Histogram: log(frequency) by time      2.17 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark g(x) setup=begin x = deepcopy(ARY) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  43.471 μs … 79.852 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     44.035 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.352 μs ±  3.060 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▆▅▃▂     ▁▁▁▁▂▅▅▃▂▁▁▁▁                                    ▂
  █████████████████████████▇▇▇▇▇▆▇▆▆▆▆▅▅▅▅▅▅▃▄▄▃▄▃▃▁▅▄▄▁▄▄▄▄▆ █
  43.5 μs      Histogram: log(frequency) by time      60.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.
Moelf commented 7 months ago

on nightly it's still the same:

julia> @benchmark g(x) setup=begin x = copy(ARY) end
BenchmarkTools.Trial: 7593 samples with 10 evaluations.
 Range (min … max):  1.280 μs …   5.617 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.288 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.513 μs ± 655.006 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ ▂▁
  ████▇▇▆▆▅▄▄▄▃▃▄▃▃▃▆▆▄▅▄▆▇▇▆▇▇▇█▆▇▆▇▆▆▆▆▆▇▆▆▆▆▆▆▆▅▆▅▆▅▅▅▄▅▅▅ █
  1.28 μs      Histogram: log(frequency) by time      4.32 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark g(x) setup=begin x = deepcopy(ARY) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  42.522 μs … 87.246 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     42.767 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   43.293 μs ±  2.071 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▇▃▂▂▁                              ▃▃                     ▁
  ███████▇█▆▅▅▄▄▃▄▃▅▄▅▆▆▅▆▅▅▅▆▆▆▆▅▆▄▅▅▇███▅▆▅▅▅▅▄▅▄▄▄▃▄▄▄▄▄▃▃ █
  42.5 μs      Histogram: log(frequency) by time      49.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.