Closed siddarth-c closed 2 months ago
Have you validated that the new version works as expected Also compare training performance across the 2 different versions
Thanks
Sorry for the wait. I've validated the updated reward function. Episodes of the new reward are, on average, 2.5 times shorter, with goals achieved in 50 timesteps compared to the previous 120. Meaning goals are reached much faster than wandering around the goal.
Here is a repo with the code and a few plots for understanding the behaviour. AntMaze did not learn in 1e6 time steps and I cant afford to run longer. But the difference in the behaviour is quite evident in PointMaze.
Thanks!
Your charts are wrong, for example episodic_return
with "new reward" gets positive values, which is not possible as it is a sum of non positive values
also there is no indication on how many runs, were tested
Description
Updated the dense reward of Maze environments from exp(-distance) to -distance
Fixes #175
Type of change
Please delete options that are not relevant.
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)