JuliaSIMD / VectorizedRNG.jl

Vectorized uniform and normal random samplers.
MIT License
33 stars 7 forks source link

Bad NEON performance #20

Open chriselrod opened 2 years ago

chriselrod commented 2 years ago
julia> using VectorizedRNG, Random

julia> x = Vector{Float64}(undef, 1024);

julia> @benchmark randn!(local_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.838 μs …  4.028 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.852 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.862 μs ± 72.118 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    █▄                                                        
  ▃███▅▄▂▂▂▂▁▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▂▁▂▁▁▂▁▂▂▁▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂ ▂
  2.84 μs        Histogram: frequency by time        3.16 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark randn!(Random.default_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.533 μs …  6.983 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.688 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.693 μs ± 77.624 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                      ▂   ▇▂▂▂▇▅▂▂▁█▁▁ ▄                      
  ▂▁▁▁▂▂▂▂▂▂▂▂▃▃▃▃▆▄▅▅█▇████████████████▆▆▅▇▄▄▃▄▃▃▃▃▂▂▂▂▂▂▂▂ ▄
  1.53 μs        Histogram: frequency by time        1.83 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand!(local_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 146 evaluations.
 Range (min … max):  698.918 ns …   8.839 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     700.630 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   707.420 ns ± 120.934 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▇▄                 ▄▄▁                                      ▂
  █████▇▆▆▆▄▅▄▆▅▆▆▆▆▅▅▅████▆▆▇█▅▆▆▄▆▆▆▆▇▆▅▄▁▁▄▅▅▄▅▄▄▅▆▅▅▄▃▆▅▅▄▅ █
  699 ns        Histogram: log(frequency) by time        755 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand!(Random.default_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 152 evaluations.
 Range (min … max):  682.566 ns … 949.836 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     683.664 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   687.544 ns ±  11.115 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▅▃▁                ▃▄     ▁▁                                ▁
  ██████▇▇▇▇▇▆▅▆▆▅▆▄▅▄▄████▅▅▅███▇▆▅▆▆▆▆▆▅▅▃▄▄▄▄▄▅▂▄▅▃▅▂▄▅▄▄▄▅▅ █
  683 ns        Histogram: log(frequency) by time        735 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> versioninfo()
Julia Version 1.9.0-DEV.1073
Commit 0b9eda116d* (2022-08-01 14:27 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 8 × Apple M1
chriselrod commented 2 years ago

For comparison, on Cascadelake:

julia> using VectorizedRNG, Random

julia> x = Vector{Float64}(undef, 1024);

julia> @benchmark randn!(local_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.183 μs …  2.638 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.227 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.229 μs ± 27.961 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▃▅▆███▇▅▃▁
  ▂▁▂▂▂▃▃▄▅▇███████████▇▆▄▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂ ▃
  1.18 μs        Histogram: frequency by time        1.34 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark randn!(Random.default_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.594 μs …  4.573 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.742 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.744 μs ± 49.598 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                        ▁▂▃▄▅▆████▆▆▇▆▅▂▂▁
  ▂▁▁▂▂▁▂▂▂▂▂▂▃▃▃▄▄▄▅▆▇████████████████████▇▆▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂ ▅
  1.59 μs        Histogram: frequency by time        1.87 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand!(local_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 732 evaluations.
 Range (min … max):  173.176 ns … 229.518 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     180.137 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   180.299 ns ±   1.007 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                ▅█
  ▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▁▁▂▁▂▁▁▁▂▁▁▁▂▂▁▂▂▂▂▁▂▁▁▂▂▂▅▆██▆▄▂▂▂▂▂▂▃▃▄▃▂ ▂
  173 ns           Histogram: frequency by time          182 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand!(Random.default_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 323 evaluations.
 Range (min … max):  266.056 ns … 382.669 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     266.514 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   266.989 ns ±   1.768 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▅▇███▆▅▄▃▁       ▁▂▂▂▂▂▁▂▂▁             ▁▁▂▃▃▂▂▁             ▂
  ███████████▅▅▃▅▆▅▇██████████▇▆▇▇▅▃▁▁▃▃▆▇███████████▇▇▇▆▆▆▅▅▅▆ █
  266 ns        Histogram: log(frequency) by time        272 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> versioninfo()
Julia Version 1.9.0-DEV.1172
Commit 18fa3835a7* (2022-08-23 13:44 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz