Farama-Foundation / Minigrid

Simple and easily configurable grid world environments for reinforcement learning
https://minigrid.farama.org/
Other
2.05k stars 597 forks source link

[Question] Is `ActionBonus` wrapper correct ? #402

Open riiswa opened 9 months ago

riiswa commented 9 months ago

ActionBonus is a wrapper that adds an exploration bonus to less visited (state,action) pairs. Regarding the code source, we make the transition to the new state s{t+1} before updating the counter of (s{t+1}, a_t). Shouldn't we update the counter of (s_t, a_t) instead (i.e. perform the step in the environment after the update)?

I may have misunderstood the purpose of the wrapper, but depending on your response I may submit a PR.

turbotimon commented 2 months ago

It seems correct to me. At least in the current main.

What may confuses is, that it does the +1 before calculating, but that's needed because the formula requires a start from 1. We could therefore make pre_count=1 and the +1 after the calculation, but the outcome is the same.

(Disclaimer: I'm not a maintainer of this repo)