JuliaReinforcementLearning / ReinforcementLearningTrajectories.jl

A generalized experience replay buffer for reinforcement learning
MIT License
8 stars 8 forks source link

Fix issues with SARTSATraces #72

Closed dharux closed 5 months ago

dharux commented 5 months ago

This fixes #70 and fixes #71.

Issue #70 : SARTSA Traces cannot be sampled correctly.

The sampleable_inds field of the episodes buffer containing a SARTSATrace (CircularArraySARTSATraces or ElasticArraySARTSATraces) was not correctly keeping track of which indices are sampleable. In a SARTSATraces, information from 3 steps are required to complete a single trace. The initial step of the episode in which only the state is pushed. The next step in which the action and next_state are pushed. The following step in which the next_action is pushed. Now, there is one sampleable index, but two unsampleable ones. Thus, the last two indices in the trace are typically unsampleable during the episode. This change was made to the function Base.push!(eb::EpisodesBuffer, xs::NamedTuple) in episodes.jl.

Issue #71 : Action and state go out of sync in CircularArraySARTSATraces

When more traces are pushed into the Traces than its capacity, the state does not match the appropriate action. To fix this, the capacity of the state trace should be one more than that of the action trace. The capacity of all traces are incremented by 1 so that the Traces can hold capacity amount of full traces.

The tests were also modified to check that it works correctly with the following usage:

  1. Push first state at the start of the episode (PreEpisodeStage in RL.jl)
  2. Push state, action, reward, terminal during the episode (PostActStage)
  3. Push action after the episode ends in a PartialNamedTuple (PostEpisodeStage)

This is the behaviour of the agent from RLCore.jl and all trajectories should work well with that. The CircularArraySARTSATraces are a bit restrictive with regards to usage and cannot be used very differently than the typical usage above. I have not been able to make it work while represented next_action and working with all general usage.

The CircularArrarySARTSATraces do not currently work with CircularPrioritizedTraces as these add additional keys like keys and priorities which are updated outside of the Traces. Some changes need to be made with CircularPrioritizedTraces to make it work with CircularArrarySARTSATraces also. Thus, there are 2 errors during the testing involving CircularPrioritizedTraces.

jeremiahpslewis commented 5 months ago

@dharux Thank you for the PR. I've updated the main branch to eliminate unrelated test failures. Would you able to resolve the remaining test errors (in CircularPrioritizedTraces)?

dharux commented 5 months ago

I could take a shot at it. It might require completely rewriting how tuples are pushed into a CircularPrioritizedTraces and how it is sampled for the case when it contains a CircularArraySARTSATraces.

jeremiahpslewis commented 5 months ago

@HenriDeh any tips or pointers for @dharux

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 79.48718% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 74.23%. Comparing base (b09e2ed) to head (29a6a3e). Report is 21 commits behind head on main.

Files Patch % Lines
src/episodes.jl 70.00% 6 Missing :warning:
src/common/CircularPrioritizedTraces.jl 92.85% 1 Missing :warning:
src/samplers.jl 66.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #72 +/- ## ========================================== - Coverage 74.65% 74.23% -0.43% ========================================== Files 15 18 +3 Lines 801 850 +49 ========================================== + Hits 598 631 +33 - Misses 203 219 +16 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jeremiahpslewis commented 5 months ago

@dharux Thanks, looks good to me, I’ll let @HenriDeh do a final check.

dharux commented 5 months ago

Needed some final fixes. Everything should be working now!