Open 1-Bart-1 opened 1 month ago
Hello, I think you are missing an important alternative, which is also recommended: evaluating each candidate for multiple episodes to remove noise due to env stochasticity.
Also, even if you seed the each env the same the first time, they will end up in different states after each trial because of the different candidates (you also don't want to optimize for a specific seed of your env).
See issues in RL Zoo:
🚀 Feature
When training with ARS in combination with AsyncEval, multiple environments are run at the same time. When seeding these environments, all environments get a different seed. There should be an option to seed all the environments with the same seed at the start of each time ARS.evaluate_candidates() is run. https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/5c81398ef854dde4eeaed51e0715c5de18a9d344/sb3_contrib/ars/ars.py#L165 https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/5c81398ef854dde4eeaed51e0715c5de18a9d344/sb3_contrib/common/vec_env/async_eval.py#L154
Motivation
Some environments have random values generated in the reset function, for instance external factors that are random. When running evaluate_candidates, these random values can have an effect on the returned rewards, which makes that some good sets of params get bad rewards and bad params get good rewards. This makes training slower. In order to mitigate this, while still generating different random values for external values, all environments in AsyncEval should be seeded with the same random number at the start of evaluate_candidates, or this should at least be an option.
Pitch
Add the following lines to the start of ARS.evaluate_candidates:
Add the following lines to AsyncEval.seed
And change the worker so that it doesnt return values after seed:
Alternatives
None.
Additional context
At least in my specific environment this method leads to great improvements in training.
Checklist