Closed AStupidBear closed 2 years ago
I can reproduce.
My nlogn
function is wrong for Float32
. Replacing it with -sqrt(-log(floatbitmask(u2,T)-oneopenconst(T)))
produces
julia> std(randn(rng, T, N))
1.0006113f0
julia> std(randn(rng, T, N))
1.0034106f0
julia> std(randn(rng, T, N))
0.99588895f0
julia> std(randn(rng, T, N))
1.000607f0
Might be a good idea to translate this algorithm to be architecture generic and use Float32
to avoid the division the SLEEFPirates uses:
https://github.com/JuliaSIMD/VectorizationBase.jl/blob/7adea842ca9bbaed371222ac863f90ce302db6f6/src/special/log.jl#L215-L254
Can of course use a smaller polynomial for Float32
.
Could you please temorarily update the code? I have no idea how to patch it.
Could you please temorarily update the code? I have no idea how to patch it.
Sure, fixed by https://github.com/JuliaSIMD/VectorizedRNG.jl/commit/c3bafaf83f3e511081b9b4d51b98e950496cc2aa.