Closed unixpickle closed 5 years ago
the issue is that reset does not env.reset()
the seed - I added a for _ in range(50):...
before the env.reset()
...
Indeed, it looks like using seed()
actually causes the environment to stick to its current seed, regardless of the argument. This prints out something like
WARNING:gym_unity:New seed 62 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 62
got 1 start states
WARNING:gym_unity:New seed 46 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 46
got 1 start states
WARNING:gym_unity:New seed 31 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 31
got 1 start states
WARNING:gym_unity:New seed 21 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 21
got 1 start states
WARNING:gym_unity:New seed 64 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 64
import os
import random
from obstacle_tower_env import ObstacleTowerEnv
counter = {}
env = ObstacleTowerEnv('./ObstacleTower/obstacletower.x86_64', worker_id=2)
while True:
env.seed(random.randrange(100))
env.reset()
for _ in range(50):
obs, _, _, _ = env.step(0)
key = str(obs.flatten().tolist())
counter[key] = True
print('got %d start states' % len(counter))
Hi @unixpickle and @Sohojoe
Thanks for bringing this to my attention. I will take a look at it, and hopefully have fix soon.
Hi all,
We've have just pushed the v2.1 release which aims to fix this. Please let us know if you still run into an issue.
Although I defined specific tower-seed when I reset the environment, random seed(0~99) is assigned instead when tower-seed >= 100. It looks like v2.1 still has the same problem.
Hi @SungbinChoi this is actually intentional. For the duration of the contest we are limiting the random seeds to 100, to prevent participants from discovering and utilizing the review seeds. Once the contest is over we will release a version with an unrestricted seed range.
Hi @awjuliani I see. I didn't know that tower-seed value range is limited to 100 in Round 2. Thanks~
The seeds are not consistent across environment instances. This means there's some other source of non-determinism. I'm guessing this means that contestants have access to unlimited seeds...
Demo script:
In older versions of the environment, this script works pretty much as expected.