Closed PallHaraldsson closed 3 years ago
What is the future of this package? After Julia 1.7 released is it redundant, and would you add more text to that effect?
The scalar functions need a lot of work / should be replaced, probably to just use scalar sampling instead of the storing-into-a-buffer approach like they use currently. AVX512:
julia> using BenchmarkTools, Random, VectorizedRNG
julia> drng = Random.default_rng(); lrng = local_rng();
julia> x = Vector{Float64}(undef, 1024);
julia> @benchmark rand!($lrng, $x)
BenchmarkTools.Trial: 10000 samples with 796 evaluations.
Range (min … max): 155.763 ns … 231.448 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 166.860 ns ┊ GC (median): 0.00%
Time (mean ± σ): 168.993 ns ± 7.900 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅ ▄▆ ▆ ▄▆▂ █ ▁▃▆▇▁▂▅▂▁▂▂▃▂▂▂▂▂▂▂▂▂▂▂█▂▂▃▆█▃▃▄▃▃▂▂▁ ▁▁ ▃
█▁▃██▆▇█▇▇█████████████████████████████████████████████▇███▇▇ █
156 ns Histogram: log(frequency) by time 185 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark rand!($drng, $x)
BenchmarkTools.Trial: 10000 samples with 455 evaluations.
Range (min … max): 223.312 ns … 414.903 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 250.856 ns ┊ GC (median): 0.00%
Time (mean ± σ): 248.928 ns ± 17.536 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅ ▂▅ ▂▅▁▁▆▁▁▃▂▁▂█▁▂▇▃▃▂▁▁▁ ▂
█▄██▇██████████████████████▇█▆▇▆▅▅▅▅▅▆▅▆▅▅▇▆▆▆▆▆▅▄▅▅▃▅▄▄▅▅▅▅▄ █
223 ns Histogram: log(frequency) by time 327 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> versioninfo()
Julia Version 1.8.0-DEV.438
Commit 88a6376e99* (2021-08-28 11:03 UTC)
Platform Info:
OS: Linux (x86_64-redhat-linux)
CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)
On my Haswell CPU (i3-4010U), I get
883 ns from local_rng()
2.125 microseconds from default_rng()
on Julia 1.7.0-beta4.2
Because of authentication issues, I cannot actually sign into github and post the results on that laptop. But Seems like VectorizedRNG is >2x faster on that AVX2 machine at generating uniform random numbers.
EDIT:
881 ns from local_rng()
1.767 microseconds from default_rng()
on Julia 1.6.2
So the old RNG seems faster on this computer, but VectorizedRNG
wins by a heft margin.
Huh, but I see the old README text already claimed dSFMT was already faster.
Also, I'm not 100%, but I think the haswell benchmarks there aren't from the same laptop I just tested on, but >1 year old benchmarks from a much faster (than that laptop) work computer I had access to (and was also Haswell) at a previous job. So, unfortunately not comparable.
What is the future of this package? After Julia 1.7 released is it redundant, and would you add more text to that effect?
Per my above comments, in vector mode, this library still seems much faster. But it's missing proper (fast) scalar mode evaluation.
Aside from performance, they also have different behavior.
The default random number generator is task local, while local_rng()
is thread local, and should perhaps be renamed to that effect.
This makes local_rng()
potentially dangerous to use with task migration, which is a new feature in recent versions of Julia. It would be fine with Polyester.@batch
, however.
The dangers should probably get a note in the README.
[skip ci]