When more traces are pushed into a CircularArraySARTSATraces than its capacity, then the state and action in the traces are no longer matching. This can be seen with the following example. The usage follows that of the agent in ReinforcementLearningCore
using ReinforcementLearningTrajectories
eb = EpisodesBuffer(
CircularArraySARTSATraces(;
capacity=3)
)
push!(eb, (state = 1,))
for i = 1:3
push!(eb, (state = i+1, action =i, reward = i, terminal = false))
end
The state i is now with action i-1, which is incorrect. It is however corrected when after the episode, a PartialNamedTuple with just an action is pushed. However, the traces will be wrong during the episode, and any algorithm that updates during an episode (in the PostActStage) will not function correctly.
When more traces are pushed into a CircularArraySARTSATraces than its capacity, then the state and action in the traces are no longer matching. This can be seen with the following example. The usage follows that of the agent in ReinforcementLearningCore
Checking the traces:
So far it is correct, state i has action i with it. If we push some more:
The state i is now with action i-1, which is incorrect. It is however corrected when after the episode, a PartialNamedTuple with just an action is pushed. However, the traces will be wrong during the episode, and any algorithm that updates during an episode (in the PostActStage) will not function correctly.