JuliaSIMD / VectorizedRNG.jl

Vectorized uniform and normal random samplers.
MIT License
33 stars 7 forks source link

RNG in Julia changed #18

Closed PallHaraldsson closed 3 years ago

PallHaraldsson commented 3 years ago

[skip ci]

PallHaraldsson commented 3 years ago

What is the future of this package? After Julia 1.7 released is it redundant, and would you add more text to that effect?

chriselrod commented 3 years ago

The scalar functions need a lot of work / should be replaced, probably to just use scalar sampling instead of the storing-into-a-buffer approach like they use currently. AVX512:

julia> using BenchmarkTools, Random, VectorizedRNG

julia> drng = Random.default_rng(); lrng = local_rng();

julia> x = Vector{Float64}(undef, 1024);

julia> @benchmark rand!($lrng, $x)
BenchmarkTools.Trial: 10000 samples with 796 evaluations.
 Range (min … max):  155.763 ns … 231.448 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     166.860 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   168.993 ns ±   7.900 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅  ▄▆  ▆  ▄▆▂ █ ▁▃▆▇▁▂▅▂▁▂▂▃▂▂▂▂▂▂▂▂▂▂▂█▂▂▃▆█▃▃▄▃▃▂▂▁    ▁▁   ▃
  █▁▃██▆▇█▇▇█████████████████████████████████████████████▇███▇▇ █
  156 ns        Histogram: log(frequency) by time        185 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand!($drng, $x)
BenchmarkTools.Trial: 10000 samples with 455 evaluations.
 Range (min … max):  223.312 ns … 414.903 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     250.856 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   248.928 ns ±  17.536 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅ ▂▅ ▂▅▁▁▆▁▁▃▂▁▂█▁▂▇▃▃▂▁▁▁                                    ▂
  █▄██▇██████████████████████▇█▆▇▆▅▅▅▅▅▆▅▆▅▅▇▆▆▆▆▆▅▄▅▅▃▅▄▄▅▅▅▅▄ █
  223 ns        Histogram: log(frequency) by time        327 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> versioninfo()
Julia Version 1.8.0-DEV.438
Commit 88a6376e99* (2021-08-28 11:03 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)
chriselrod commented 3 years ago

On my Haswell CPU (i3-4010U), I get 883 ns from local_rng() 2.125 microseconds from default_rng() on Julia 1.7.0-beta4.2

Because of authentication issues, I cannot actually sign into github and post the results on that laptop. But Seems like VectorizedRNG is >2x faster on that AVX2 machine at generating uniform random numbers.

EDIT: 881 ns from local_rng() 1.767 microseconds from default_rng() on Julia 1.6.2

So the old RNG seems faster on this computer, but VectorizedRNG wins by a heft margin.

chriselrod commented 3 years ago

Huh, but I see the old README text already claimed dSFMT was already faster.

chriselrod commented 3 years ago

Also, I'm not 100%, but I think the haswell benchmarks there aren't from the same laptop I just tested on, but >1 year old benchmarks from a much faster (than that laptop) work computer I had access to (and was also Haswell) at a previous job. So, unfortunately not comparable.

chriselrod commented 3 years ago

What is the future of this package? After Julia 1.7 released is it redundant, and would you add more text to that effect?

Per my above comments, in vector mode, this library still seems much faster. But it's missing proper (fast) scalar mode evaluation.

Aside from performance, they also have different behavior. The default random number generator is task local, while local_rng() is thread local, and should perhaps be renamed to that effect. This makes local_rng() potentially dangerous to use with task migration, which is a new feature in recent versions of Julia. It would be fine with Polyester.@batch, however. The dangers should probably get a note in the README.