Closed WardBrian closed 1 year ago
Per our conversation earlier, just noting that care may need to be taken to ensure reproducibility, correctness, and non-serialization if we wind up implementing real parallelism.
Yes, just to check, the multiprocessing
library will just clone a randomstate
import numpy as np
import multiprocessing
gen = np.random.default_rng(123)
def f(_):
return gen.random()
with multiprocessing.Pool(5):
x = p.map(f, list(range(5)))
print(x)
[0.6823518632481435,
0.6823518632481435,
0.6823518632481435,
0.6823518632481435,
0.6823518632481435]
As long as we are aware of this it is easy to work around by generating a new random object on each process which is seeded by a seed from the main RNG
care may need to be taken to ensure reproducibility, correctness, and non-serialization if we wind up implementing real parallelism.
Real parallelism is one of the main motivations for managing your own PRNGs.
it is easy to work around by generating a new random object on each process which is seeded by a seed from the main RNG
We asked around about this when we were developing Stan. The preferred workaround is to use something like the linear congruential PRNG we use in Stan, which lets you cheaply skip over the next N random variates. For multiple chains (parallel or not), each chain uses the same PRNG, but each subsequent chain after the first advances the PRNG by something like 10^12 draws.
This uses numpy's recommendation of
np.random.default_rng(seed)
. If this is fedNone
(our default), it is random each run, but it can be given an integer or an instance of theGenerator
class, which it just returns untouched.