facebookresearch / minihack

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Apache License 2.0
460 stars 51 forks source link

[BUG] `reward_lose` not correctly set at timeout #103

Open epignatelli opened 2 months ago

epignatelli commented 2 months ago

🐛 Bug

When setting max_episode_steps, while gym correctly triggers a reset, MiniHack (and perhaps even NLE) does not set StepStatus.ABORTED.

As a consequence, the reward for Timeout is not reward_lose.

To Reproduce

import gym
import gym.vector
import minihack

MAX_STEPS = 20

env = gym.make(
    "MiniHack-KeyRoom-S5-v0",
    reward_lose=-1.0,
    max_episode_steps=MAX_STEPS,
)
timestep = env.reset()
env.render()

for i in range(MAX_STEPS):
    timestep = env.step(3) 
    reward = timestep[1]
    info = timestep[-1]
    print(reward, info["end_status"])

assert int(info["end_status"]) == -1

Expected behavior

int(info["end_status"]) should be -1

Potential reasons

gym.make accepts max_episode_steps as argument. For this reason, max_episode_steps is not included in kwargs and it is not passed through to the MiniHack constructor.

epignatelli commented 2 months ago

You can temporarily patch it with this

class PatchTimeoutWrapper(gym.Wrapper):
    def __init__(self, env: gym.Env):
        super().__init__(env)
        self.unwrapped._max_episode_steps = self._max_episode_steps

And use:

env = gym.make(
    "MiniHack-KeyRoom-S5-v0",
    reward_lose=-1.0,
    max_episode_steps=MAX_STEPS,
)
env = PatchTimeoutWrapper(env)