The current signature of a reward function takes three parameters: the previous state, the action taken in that state, and the following state.
This means that the reward function is not Markovian, which breaks canonical RL assumptions.
See https://arxiv.org/abs/2111.00876 or https://arxiv.org/abs/2212.10420 for me.
The reward function should take only two parameters R(s,a)
The current signature of a reward function takes three parameters: the previous state, the action taken in that state, and the following state. This means that the reward function is not Markovian, which breaks canonical RL assumptions. See https://arxiv.org/abs/2111.00876 or https://arxiv.org/abs/2212.10420 for me.
The reward function should take only two parameters R(s,a)