Farama-Foundation / Minigrid

Simple and easily configurable grid world environments for reinforcement learning
https://minigrid.farama.org/
Other
2.09k stars 604 forks source link

[Bug Report] The reward obtained is not as described in the document #332

Closed Seraphli closed 1 year ago

Seraphli commented 1 year ago

Describe the bug Run manual_control.py to play MiniGrid-DoorKey-5x5-v0, and the final reward is 0.97, not 1.0.

Mission: use the key to open the door and then get to the goal
pressed pageup
step=1, reward=0.00
pressed up
step=2, reward=0.00
pressed right
step=3, reward=0.00
pressed  
step=4, reward=0.00
pressed up
step=5, reward=0.00
pressed up
step=6, reward=0.00
pressed right
step=7, reward=0.00
pressed up
step=8, reward=0.00
pressed up
step=9, reward=0.97
terminated!

System Info minigrid==2.1.0

Checklist

Seraphli commented 1 year ago

I check the code and the reward is changed to 1 - 0.9 * (self.step_count / self.max_steps). Maybe the document needs to be updated.

pseudo-rnd-thoughts commented 1 year ago

@Seraphli What documentation where needs updating?

Seraphli commented 1 year ago

For example, https://minigrid.farama.org/environments/minigrid/EmptyEnv/ image

pseudo-rnd-thoughts commented 1 year ago

@Seraphli Would you be able to update all of the environment's documentation? It looks like all of the documentation is incorrect This just means modifying the docstring in each of the environment classes in the ## rewards section with a correct sentence

Seraphli commented 1 year ago

Yes, I can make a pull request. How about changing it to this? A reward of '1 - 0.9 * (step_count / max_steps)' is given for success, and '0' for failure.

Seraphli commented 1 year ago

I notice other issues while I'm updating the document.

  1. There is no document for GoToObjectEnv. I will add one similar to GoToDoorEnv.
  2. @maximecb seems only update the reward function in GoToDoorEnv, which is here https://github.com/Farama-Foundation/Minigrid/blob/master/minigrid/envs/gotodoor.py#L145. but leave the GoToObjectEnv unchanged. I will also update the reward function.
  3. In here https://github.com/Farama-Foundation/Minigrid/blob/master/minigrid/envs/gotodoor.py#L140-L141, the GoToDoorEnv will terminate when an agent presses toggle. But the document says toggle is unused. I think this is not good for exploration and should be removed. What do you think about it?
pseudo-rnd-thoughts commented 1 year ago

I notice other issues while I'm updating the document.

  1. There is no document for GoToObjectEnv. I will add one similar to GoToDoorEnv.
  2. @maximecb seems only update the reward function in GoToDoorEnv, which is here https://github.com/Farama-Foundation/Minigrid/blob/master/minigrid/envs/gotodoor.py#L145. but leave the GoToObjectEnv unchanged. I will also update the reward function.
  3. In here https://github.com/Farama-Foundation/Minigrid/blob/master/minigrid/envs/gotodoor.py#L140-L141, the GoToDoorEnv will terminate when an agent presses toggle. But the document says toggle is unused. I think this is not good for exploration and should be removed. What do you think about it?
  1. That would be great thanks @BolunDai0216 What are your thoughts on 2 and 3? If we do, we just need to bump the version number