Action and state go out of sync in CircularArraySARTSATraces

When more traces are pushed into a CircularArraySARTSATraces than its capacity, then the state and action in the traces are no longer matching. This can be seen with the following example. The usage follows that of the agent in ReinforcementLearningCore

using ReinforcementLearningTrajectories

eb = EpisodesBuffer(
        CircularArraySARTSATraces(;
        capacity=3)
    )
push!(eb, (state = 1,))
for i = 1:3
    push!(eb, (state = i+1, action =i, reward = i, terminal = false))
end

Checking the traces:

julia> for t in eb
           println(t)
       end
(state = 1, next_state = 2, action = 1, next_action = 2, reward = 1.0f0, terminal = false)
(state = 2, next_state = 3, action = 2, next_action = 3, reward = 2.0f0, terminal = false)

So far it is correct, state i has action i with it. If we push some more:

for i = 4:6
    push!(eb, (state = i+1, action =i, reward = i, terminal = false))
end

julia> for t in eb
           println(t)
       end
(state = 4, next_state = 5, action = 3, next_action = 4, reward = 4.0f0, terminal = false)
(state = 5, next_state = 6, action = 4, next_action = 5, reward = 5.0f0, terminal = false)
(state = 6, next_state = 7, action = 5, next_action = 6, reward = 6.0f0, terminal = false)

The state i is now with action i-1, which is incorrect. It is however corrected when after the episode, a PartialNamedTuple with just an action is pushed. However, the traces will be wrong during the episode, and any algorithm that updates during an episode (in the PostActStage) will not function correctly.

JuliaReinforcementLearning / ReinforcementLearningTrajectories.jl

Action and state go out of sync in CircularArraySARTSATraces #71