Closed edbennett closed 3 months ago
@LupoA, @edbennett, we might have a conversation about this at some point, as it is a release blocker?
Yes; shall we wait until we're all in adjacent time zones?
Fair enough, so @LupoA, next week would you be available? Both me and Ed are going to be back from the beginning of next week (@edbennett, correct me if I am wrong, please).
Yes we can meet when you are back. An idea could be to use the Friday slot but I'm open for other choices
sha256([datapath, prec, tmax, sigma, ...]) -> some number -> seed
I made an attempt to solve this issue and pushed it.
I have also added a test showing that changing parameters the seed is different, and resetting them to be the same, it is reproducible.
as a check, we can do a histogram of the integers picked by the bootstrap and check they follow a uniform distribution. It is probably an overkill and I am fine with the issue to be closed regardless
In `
seed = generate_seed(par)
random.seed(seed)
np.random.seed(random.randint(0, 2 ** (32) - 1))`
Would it make sense having the last two lines to be moved inside generate_seed? which could be then called initialise_rng or something like that
Generating a seed is a separate concern from using it to seed a generator; it arguably makes sense to have a utility function to seed the generator, but how the seed is generated should be separate. (Whether initialise_rng()
calls generate_seed()
, or takes the seed as an argument, I have no strong opinion on.)
random.seed(1994)
is definitely reproducible, but will introduce correlations between otherwise independent analyses, and if applied to parallel code may introduce spurious statistics. This should be adjusted to still be reproducible, but also be more ergodic.