Unity-Technologies / obstacle-tower-env

Obstacle Tower Environment
Apache License 2.0
542 stars 125 forks source link

The seed() function is broken #91

Closed unixpickle closed 5 years ago

unixpickle commented 5 years ago

The seeds are not consistent across environment instances. This means there's some other source of non-determinism. I'm guessing this means that contestants have access to unlimited seeds...

Demo script:

import os

from obstacle_tower_env import ObstacleTowerEnv

counter = {}
for i in range(0, 25):
    env = ObstacleTowerEnv('./ObstacleTower/obstacletower.x86_64', worker_id=i)
    env.seed(25)
    env.reset()
    for _ in range(50):
        obs, _, _, _ = env.step(0)
    key = str(obs.flatten().tolist())
    counter[key] = True
    print('got %d start states' % len(counter))
    env.close()

In older versions of the environment, this script works pretty much as expected.

Sohojoe commented 5 years ago

the issue is that reset does not env.reset() the seed - I added a for _ in range(50):... before the env.reset() ...

Screen Recording 2019-05-18 at 11 47 34 PM

unixpickle commented 5 years ago

Indeed, it looks like using seed() actually causes the environment to stick to its current seed, regardless of the argument. This prints out something like

WARNING:gym_unity:New seed 62 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 62
got 1 start states
WARNING:gym_unity:New seed 46 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 46
got 1 start states
WARNING:gym_unity:New seed 31 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 31
got 1 start states
WARNING:gym_unity:New seed 21 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 21
got 1 start states
WARNING:gym_unity:New seed 64 will apply on next reset.
INFO:mlagents_envs:Academy reset with parameters: tower-seed -> 64
import os
import random

from obstacle_tower_env import ObstacleTowerEnv

counter = {}
env = ObstacleTowerEnv('./ObstacleTower/obstacletower.x86_64', worker_id=2)
while True:
    env.seed(random.randrange(100))
    env.reset()
    for _ in range(50):
        obs, _, _, _ = env.step(0)
    key = str(obs.flatten().tolist())
    counter[key] = True
    print('got %d start states' % len(counter))
awjuliani commented 5 years ago

Hi @unixpickle and @Sohojoe

Thanks for bringing this to my attention. I will take a look at it, and hopefully have fix soon.

awjuliani commented 5 years ago

Hi all,

We've have just pushed the v2.1 release which aims to fix this. Please let us know if you still run into an issue.

SungbinChoi commented 5 years ago

Although I defined specific tower-seed when I reset the environment, random seed(0~99) is assigned instead when tower-seed >= 100. It looks like v2.1 still has the same problem.

awjuliani commented 5 years ago

Hi @SungbinChoi this is actually intentional. For the duration of the contest we are limiting the random seeds to 100, to prevent participants from discovering and utilizing the review seeds. Once the contest is over we will release a version with an unrestricted seed range.

SungbinChoi commented 5 years ago

Hi @awjuliani I see. I didn't know that tower-seed value range is limited to 100 in Round 2. Thanks~