JuliaFolds / Transducers.jl

Efficient transducers for Julia
https://juliafolds.github.io/Transducers.jl/dev/
MIT License
432 stars 24 forks source link

Splitting pseudorandom number generators #519

Open sethaxen opened 2 years ago

sethaxen commented 2 years ago

As discussed on Slack, it would be nice if Transducers supported more general PRNGs. e.g.

julia> using Transducers, Random

julia> function foo(rng, n)
           trans = Map(_ -> rand(rng))
           foldxt(vcat, trans, withprogress(1:n))
       end
foo (generic function with 1 method)

julia> foo(Random.seed!(42), 1000) == foo(Random.seed!(42), 1000)
true

julia> foo(MersenneTwister(42), 1000) == foo(MersenneTwister(42), 1000)
false

As I understand it, the first case works (on recent Julia versions) because the default PRNG is partitioned between the tasks. This is not the case for the second version.

We can support more general PRNGs by manually chunking (adapted from https://juliafolds.github.io/FLoops.jl/dev/howto/parallel/#Efficient-and-reproducible-usage-patterns-of-random-number-generators):

julia> using Transducers, Future, Random

julia> function nrandjump(rng, n, jump=big(10)^20)
           return accumulate(Future.randjump, fill(jump, n), init = rng)
       end
nrandjump (generic function with 2 methods)

julia> function foo(rng, n)
           ntasks = Threads.nthreads()
           rngs = nrandjump(rng, ntasks)
           chunks = Iterators.partition(1:n, cld(n, ntasks))
           trans = Map() do (rng, chnk)
               chnk |> Map(_ -> rand(rng)) |> collect
           end
           foldxt(vcat, trans, withprogress(zip(rngs, chunks)))
       end
foo (generic function with 1 method)

julia> foo(MersenneTwister(42), 1000) == foo(MersenneTwister(42), 1000)
true

but this is less convenient and doesn't allow us to get per-iteration updates to the progress bar.

@tkf mentioned that an alternative would be to use the SplittableBase API for PRNGs here in Transducers. IIUC, this would cause all PRNGs to be automatically split across the chunks (as we have done in the latter example) while keeping the simpler syntax of the former example.

Perhaps this is a duplicate of #22?

tkf commented 2 years ago

Thanks, I have some ideas for solving this. #22 is something different.