facebookresearch / minimax

Efficient baselines for autocurricula in JAX.
Apache License 2.0
165 stars 14 forks source link

PLR Sampling Weights #7

Closed Michael-Beukman closed 2 weeks ago

Michael-Beukman commented 1 month ago

In src/minimax/util/rl/plr.py, the _get_replay_dist function, I think there may be two problems.

  1. In here, 1/jnp.arange(self.buffer_size) there is a division by zero.
  2. Here, score_dist = scores/self.temp, the score distribution is divided by the temperature, instead of being taken to the power of (1/temp) as in the original prioritised level replay paper.
minqi commented 1 month ago

Thanks for finding this issue. I'm testing a branch with a fix for (1) and changing (2) to be consistent with the power-based temperature setting. I'll run a sweep to see the impact on performance and share the update.