epignatelli / navix

Accelerated minigrid environments with JAX
Apache License 2.0
103 stars 7 forks source link

tasks now take 3 parameters R(s, a, s'), which makes the reward function non-Markovian #48

Closed epignatelli closed 3 months ago

epignatelli commented 12 months ago

The current signature of a reward function takes three parameters: the previous state, the action taken in that state, and the following state. This means that the reward function is not Markovian, which breaks canonical RL assumptions. See https://arxiv.org/abs/2111.00876 or https://arxiv.org/abs/2212.10420 for me.

The reward function should take only two parameters R(s,a)