flatironinstitute / bayes-kit

Bayesian inference and posterior analysis for Python
MIT License
42 stars 3 forks source link

Add seed arguments to objects which need randomness #8

Closed WardBrian closed 1 year ago

WardBrian commented 1 year ago

This uses numpy's recommendation of np.random.default_rng(seed). If this is fed None (our default), it is random each run, but it can be given an integer or an instance of the Generator class, which it just returns untouched.

jsoules commented 1 year ago

Per our conversation earlier, just noting that care may need to be taken to ensure reproducibility, correctness, and non-serialization if we wind up implementing real parallelism.

WardBrian commented 1 year ago

Yes, just to check, the multiprocessing library will just clone a randomstate

import numpy as np
import multiprocessing
gen = np.random.default_rng(123)

def f(_):
    return gen.random()

with multiprocessing.Pool(5):
    x = p.map(f, list(range(5)))

print(x)
[0.6823518632481435,
 0.6823518632481435,
 0.6823518632481435,
 0.6823518632481435,
 0.6823518632481435]

As long as we are aware of this it is easy to work around by generating a new random object on each process which is seeded by a seed from the main RNG

bob-carpenter commented 1 year ago

care may need to be taken to ensure reproducibility, correctness, and non-serialization if we wind up implementing real parallelism.

Real parallelism is one of the main motivations for managing your own PRNGs.

it is easy to work around by generating a new random object on each process which is seeded by a seed from the main RNG

We asked around about this when we were developing Stan. The preferred workaround is to use something like the linear congruential PRNG we use in Stan, which lets you cheaply skip over the next N random variates. For multiple chains (parallel or not), each chain uses the same PRNG, but each subsequent chain after the first advances the PRNG by something like 10^12 draws.