Closed sergio-hcsoft closed 6 years ago
Reward is reshaped via relativize function, while dist is just divided by mean dist. Gains of about +50% on pacman 200 walkers tests.
@Guillemdb ☝️
Reward is reshaped via relativize function, while dist is just divided by mean dist. Gains of about +50% on pacman 200 walkers tests.