Open rsokl opened 3 years ago
Isn't the whole point that of these new interfaces that users explicitly pass the generator object around?
If so, we only need to register the global PRNG that the generators are seeded off, and everything will work from there.
Isn't the whole point that of these new interfaces that users explicitly pass the generator object around?
Yep, that is correct!
we only need to register the global PRNG that the generators are seeded off
My understanding is that the generator objects are not seeded off of a global generator, and that they can only be seeded independently; I think being able to use a global PRNG would defeat the purpose of numpy's redesign. The reason why the new system expects folks to pass around generator objects is that those generator objects can be used/seeded without concern that, in some other portion of the code, the generator object is silently getting re-seeded.
So what do we need to do then? I was thinking of monkeypatching np.random.default_rng()
to use a known seed when passed None
, instead of (or by) controlling the PRNG that seed would otherwise be drawn from.
If the user passes an explicitly-seeded PRNG, it should be pretty obvious what's happening when or if we raise Flaky
.
When making this post, my thoughts were that we would involve identify the appropriate substitutes for seed
, get_state
, and set_state
in terms of the new bit-generator/generator system, and provide a shim to make it trivial for users to register their new sources of RNG. I still think that this is a good path forward, although I'll be interested if folks from the NumPy mailing list have other ideas.
I am hoping to eventually find some time to loop back and hit some of the To-Dos that I laid out in my original post. It is just a matter of me scrounging up time to do so.
Oh! We could also make a strategy in hypothesis.extra.numpy
that hands a user a generator that they can pass to their test code/other strategies, and that we manage for them (this would still involve our figuring out the seed
/get_state
/set_state
substitutes)! This probably is an even more convenient and obvious (and easy to document) solution for users.
Based on a quick conversation, we plan to:
npst.rngs()
(todo better name), which will basically be st.builds(np.random.default_rng, st.integers())
with a nicer repr - much like st.randoms(use_true_random=True)
.default_rng()
in order to use a constant seed instead of a random seed, much like we set the state for global Random
instances (or use a drawn seed with st.random_module()
, etc.). People should use the former, but it's important that we give a nice user experience even if without best-practices.Are there any plans to address this issue?
Hypothesis is an all-volunteer project, and so far people have been volunteering on other issues instead.
If you're interested in helping out, I'm very happy to support that through advice, code review, and so on 😊
I would love to contribute but I don't know the internal workings of hypothesis. I was looking at #3510, is that a good starting point?
Yep, that's a great place to start!
I think this should be a pretty self-contained change - it'd be perfectly feasible to implement this strategy downstream, we want to provide it in hypothesis.extra.numpy
to make users' lives easier rather than because it needs internals 🙂
This is going to be a somewhat sprawling issues. All of the topics here involve Hypothesis' approaches to making random code deterministic. I will happily close this and turn it into a collection of modular issues/PRs, but first I want to lay everything out and get @Zac-HD 's input.
Weakrefs
(Addressed in #3135 )
We should only make weak references to the generators that we manage (as well as other "register" functions that Hypothesis provides)
NumPy
NumPy has moved away from its old global random state (e.g.
np.random.seed
,np.random.uniform
, etc.). In favor of a new RNG system that uses a combination of bit-generators and generators. This API is very different from those of global-state RNG systems. Presently, it is not clear how a user should have Hypothesis make theirnumpy.random
code deterministic.To me, the bare-minimum would involve identifying the appropriate substitutes for
seed
,get_state
, andset_state
in terms of the new bit-generatore/generator system, and provide a shim to make it trivial for users to register this new source of RNG.A much more ambitious goal is to still, magically, handle all of this for the user. The only thing that comes to mind is to have NumPy register the creation of new generators, and we then tap into that registry to manage those generators. I would not be surprised if NumPy (understandably) does not want to do this.
Some near-term To-Dos:
seed
,get_state
, andset_state
for users to leverageUseful reference material
PyTorch
See if PyTorch is willing to add a plugin so that Hypothesis will manage their global generator like this (but with
register_random
instead ofregister_type_strategy
).Additionally, torch also supplies a Generator. I recall reading that PyTorch was planning to redesign things like DataLoaders to accept generators, which is similar to the new best practices for NumPy's RNG. Thus, any solution we cook up for the NumPy case should be designed to be future-compatible here as well.
Edit: I just realized that PyTorch actually uses Hypothesis for some of its tests. As far as I can tell, they do not use register_random in their test suite