YeWR / EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
GNU General Public License v3.0
847 stars 131 forks source link

The first selfplay worker uses the same seed for all parallel environments #27

Open rPortelas opened 2 years ago

rPortelas commented 2 years ago

I might have found an unexpected behavior in how parallel training environments are being seeded.

I am referring to this line: https://github.com/YeWR/EfficientZero/blob/c533ebf5481be624d896c19f499ed4b2f7d7440d/core/selfplay_worker.py#L112

Because the rank of the first selfplay worker is 0, parallel environments are being initialized with the same seed, which might reduce training data diversity.

We could go for a simple fix like replacing self.rank by (self.rank + 1), however this is still problematic if considering multiple workers, as there will be seed overlap between them anyway.

A good option might be to sample a seed for each parallel environment using numpy (which is seeded before launching data workers). For instance:

envs = [self.config.new_game(np.random.randint(10**9)) for i in range(env_nums)]

jamesliu commented 2 years ago

Ditto, but using randint may cause irreproducible.

rPortelas commented 2 years ago

Hmm right right. Thanks for the input.

Then we could use a dedicated random state created from the original seed:

rnd_state = np.random.RandomState(self.config.seed + self.rank)
envs = [self.config.new_game(rnd_state.randint(10**9)) for _ in range(env_nums)]