Open rPortelas opened 2 years ago
Ditto, but using randint may cause irreproducible.
Hmm right right. Thanks for the input.
Then we could use a dedicated random state created from the original seed:
rnd_state = np.random.RandomState(self.config.seed + self.rank)
envs = [self.config.new_game(rnd_state.randint(10**9)) for _ in range(env_nums)]
I might have found an unexpected behavior in how parallel training environments are being seeded.
I am referring to this line: https://github.com/YeWR/EfficientZero/blob/c533ebf5481be624d896c19f499ed4b2f7d7440d/core/selfplay_worker.py#L112
Because the rank of the first selfplay worker is 0, parallel environments are being initialized with the same seed, which might reduce training data diversity.
We could go for a simple fix like replacing
self.rank
by(self.rank + 1)
, however this is still problematic if considering multiple workers, as there will be seed overlap between them anyway.A good option might be to sample a seed for each parallel environment using numpy (which is seeded before launching data workers). For instance:
envs = [self.config.new_game(np.random.randint(10**9)) for i in range(env_nums)]