PrincetonUniversity / PsyNeuLink

A block modeling system for cognitive neuroscience
https://psyneulink.org
Apache License 2.0
86 stars 31 forks source link

Intermittent failures in `test_parameter_optimization_ddm[LLVM-optuna_cmaes_sampler]` #2874

Open jvesely opened 9 months ago

jvesely commented 9 months ago

The test fails by selecting the wrong result ~3/100 times.

PRNG determinism in PECOptimizationFunction is done (in a hacky way) by replacing optuna's PRNG instance: psyneulink/core/components/functions/nonstateful/fitfunctions.py:802

                # We need to hook into Optuna's random number generator here so that we can allow PsyNeuLink's RNS to
                # determine the seed for Optuna's RNG. Pretty hacky unfortunately.
                opt_func._rng = np.random.RandomState(self.owner.initial_seed)

which is both insufficient and incorrect.

It is insufficient because it only works for instances of optuna.RandomSampler. However, this test uses CmaEsSampler. Moreover only 1 dimension of sampling is used and optuna switches to an embedded RandomSampler, which uses a different PRNG. CmaEsSampler is also using PRNG in a different member;

cmaes instantiates its PRNG in opt_func._cma_rng and the independent sampler PRNG is in opt_func._independent_sampler._rng.

The hacky assignment also causes issues when updating to optuna 3.4.0 and 3.5.0 since the numpy PRNG instance was moved to _rng.rng leading to the following error:

self._rng.rng.uniform(trans.bounds[:, 0], trans.bounds[:, 1])
E       AttributeError: 'numpy.random.mtrand.RandomState' object has no attribute 'rng'

The immediate solution to the intermittency is to instantiate optuna samplers with a fixed seed.

The hacky assignment of PRNG should also be removed. If using the same seed as parent OCM (self.owner.initial_seed) is required, the PECOptimizationFunction needs to instance new optuna samplers using the correct seed.

davidt0x commented 9 months ago

Hey @jvesely,

Yeah, I meant to revisit, thanks for catching it. I think I have worked out a better way with this method. The issue is that I want to allow users to pass in instantiated optuna samplers but the RandomState is created when the sampler is created. However, I didn't notice that they have a reseed_rng() method, so I can change the seed and call reseed_rng() to update the actual random state. I think this is cleaner approach. Though, now that I am thinking, if the user passed a seed to the sampler when instantiating it we should probably respect that seed over the PNL seed I guess. I have the proposed change here: https://github.com/PrincetonUniversity/PsyNeuLink/tree/fix_optuna_rng. What do you think?

Also, not sure this will fix the intermittent failure issue. Do you think that is what is causing this? I can't figure that one out for the life of me. Mainly because I can't reproduce locally at all.

davidt0x commented 9 months ago

Oh, I just saw your closed PR where you say fixing the seed issue did fix the intermittent failures. Can you test the above approach where the RNG is reseeded when using the PNL seed? Are you just running the test 100 times?

jvesely commented 9 months ago

I don't see how calling reseed_rng helps here. The implementation is not reading the seed from anywhere [0,1]. The only way I found to set a custom seed is in the sampler constructor.

Yeah:

for i in `seq 100`; do echo $i; pytest -k cmaes tests/composition/test_parameterestimationcomposition.py -n0; done | tee pec.out

showed 3 failures in initial testing.

[0] https://github.com/optuna/optuna/blob/master/optuna/samplers/_random.py#L43 [1] https://github.com/optuna/optuna/blob/master/optuna/samplers/_cmaes.py#L363

davidt0x commented 9 months ago

Oh your right, I figured it cached the seed. Well, maybe we will just have to settle for something like a warning message when seed is passed to PEC with an Optuna sampler that states that the seed needs to be specified on the sampler as well to get deterministic behavior. Anyway, thanks for figuring this all out!

jvesely commented 9 months ago

Oh your right, I figured it cached the seed. Well, maybe we will just have to settle for something like a warning message when seed is passed to PEC with an Optuna sampler that states that the seed needs to be specified on the sampler as well to get deterministic behavior. Anyway, thanks for figuring this all out!

to make things more interesting CmaEs sampler reseeds the PRNG regularly, so it looks like we have very little control over the random sequences unless we want to restrict allowed samplers to those that can be hacked around.

I'll leave this issue open to do something about the

opt_func._rng = np.random.RandomState(self.owner.initial_seed)

line. I think it can just be removed since it only works for RandomSampler, and even for that it breaks in optuna 3.4.0/3.5.0.