Fixed issues related to RNG not being consistent. Also found out that perturbations in ES were being applied everywhere.
Second issue resolved by deep copying the policy architecture. At some point, it will be nice to change this. But the time it adds isn't very large. On my laptop it takes 1e-6 seconds to deep copy the architecture.
Fixed issues related to RNG not being consistent. Also found out that perturbations in ES were being applied everywhere.
Second issue resolved by deep copying the policy architecture. At some point, it will be nice to change this. But the time it adds isn't very large. On my laptop it takes 1e-6 seconds to deep copy the architecture.