JuliaReinforcementLearning / ReinforcementLearningTrajectories.jl

A generalized experience replay buffer for reinforcement learning
MIT License
8 stars 8 forks source link

Action and state go out of sync in CircularArraySARTSATraces #71

Closed dharux closed 5 months ago

dharux commented 5 months ago

When more traces are pushed into a CircularArraySARTSATraces than its capacity, then the state and action in the traces are no longer matching. This can be seen with the following example. The usage follows that of the agent in ReinforcementLearningCore

using ReinforcementLearningTrajectories

eb = EpisodesBuffer(
        CircularArraySARTSATraces(;
        capacity=3)
    )
push!(eb, (state = 1,))
for i = 1:3
    push!(eb, (state = i+1, action =i, reward = i, terminal = false))
end

Checking the traces:

julia> for t in eb
           println(t)
       end
(state = 1, next_state = 2, action = 1, next_action = 2, reward = 1.0f0, terminal = false)
(state = 2, next_state = 3, action = 2, next_action = 3, reward = 2.0f0, terminal = false)

So far it is correct, state i has action i with it. If we push some more:

for i = 4:6
    push!(eb, (state = i+1, action =i, reward = i, terminal = false))
end
julia> for t in eb
           println(t)
       end
(state = 4, next_state = 5, action = 3, next_action = 4, reward = 4.0f0, terminal = false)
(state = 5, next_state = 6, action = 4, next_action = 5, reward = 5.0f0, terminal = false)
(state = 6, next_state = 7, action = 5, next_action = 6, reward = 6.0f0, terminal = false)

The state i is now with action i-1, which is incorrect. It is however corrected when after the episode, a PartialNamedTuple with just an action is pushed. However, the traces will be wrong during the episode, and any algorithm that updates during an episode (in the PostActStage) will not function correctly.