[Question] Is `ActionBonus` wrapper correct ?

Farama-Foundation / Minigrid

Simple and easily configurable grid world environments for reinforcement learning

Other

2.05k stars 597 forks source link

ActionBonus is a wrapper that adds an exploration bonus to less visited (state,action) pairs. Regarding the code source, we make the transition to the new state s{t+1} before updating the counter of (s{t+1}, a_t). Shouldn't we update the counter of (s_t, a_t) instead (i.e. perform the step in the environment after the update)?

I may have misunderstood the purpose of the wrapper, but depending on your response I may submit a PR.

[x] I have checked that there is no similar issue in the repo (required)

Farama-Foundation / Minigrid

[Question] Is `ActionBonus` wrapper correct ? #402