adaptive-intelligent-robotics / QDax

Accelerated Quality-Diversity
https://qdax.readthedocs.io/en/latest/
MIT License
266 stars 45 forks source link

Update the action space of PointMaze to be [-1, 1]^2 #101

Closed felixchalumeau closed 2 years ago

felixchalumeau commented 2 years ago

Related to #37

All envs have an action space being a cartesian product of [-1, 1], except PointMaze that has [-0.1, 0.1]. This PR fixes this by moving the action space of PointMaze to [-1, 1]^2.

This has a small impact on the optimization process. Illustrated by the following plots.

Optim with MAP-Elites and the previous action space: map-elites-ptmaze-old-action-space

Optim with MAP-Elites and the new action space: map-elites-ptmaze-new-action-space

The slight difference in the optimization process arises because the previous implementation was feeding arbitrarily large actions (i.e. in the range of -1 and 1) where the action was then clipped to defined min max of the environment (i.e. -0.1 and 0.1). The new implementation standardizes the action space to force min max of the environment to be -1, 1 and then does the scaling to -0.1 and 0.1 internally within the environment. We can expect more similar results to the new implementation if we rescaled the outputs of the policy to -0.1 and 0.1 before passing it into the old environment implementation.

codecov-commenter commented 2 years ago

Codecov Report

Merging #101 (d68a5a4) into develop (e58472e) will not change coverage. The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop     #101   +/-   ##
========================================
  Coverage    90.72%   90.72%           
========================================
  Files           82       82           
  Lines         4572     4572           
========================================
  Hits          4148     4148           
  Misses         424      424           
Impacted Files Coverage Δ
qdax/environments/pointmaze.py 95.37% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more