Farama-Foundation / Gymnasium-Robotics

A collection of robotics simulation environments for reinforcement learning
https://robotics.farama.org/
MIT License
485 stars 79 forks source link

Updated Dense Reward for Maze tasks #216

Closed siddarth-c closed 2 months ago

siddarth-c commented 3 months ago

Description

Updated the dense reward of Maze environments from exp(-distance) to -distance

Fixes #175

Type of change

Please delete options that are not relevant.

Screenshots

Please attach before and after screenshots of the change if applicable.

image

Checklist:

Kallinteris-Andreas commented 3 months ago

Have you validated that the new version works as expected Also compare training performance across the 2 different versions

Thanks

siddarth-c commented 3 months ago

Sorry for the wait. I've validated the updated reward function. Episodes of the new reward are, on average, 2.5 times shorter, with goals achieved in 50 timesteps compared to the previous 120. Meaning goals are reached much faster than wandering around the goal.

https://github.com/Farama-Foundation/Gymnasium-Robotics/assets/50509572/81f43765-d9ba-4770-9915-0eac297e1601

siddarth-c commented 3 months ago

Here is a repo with the code and a few plots for understanding the behaviour. AntMaze did not learn in 1e6 time steps and I cant afford to run longer. But the difference in the behaviour is quite evident in PointMaze.

Thanks!

Kallinteris-Andreas commented 3 months ago

Your charts are wrong, for example episodic_return with "new reward" gets positive values, which is not possible as it is a sum of non positive values

also there is no indication on how many runs, were tested