JuliaReinforcementLearning / ReinforcementLearningTrajectories.jl

A generalized experience replay buffer for reinforcement learning
MIT License
8 stars 8 forks source link

Fix priority sampling #57

Closed HenriDeh closed 1 year ago

HenriDeh commented 1 year ago

This PR fixes the issue of the unlikely but not impossible sampling of invalid steps with prioritized replay buffers. Instead of relying on the internal sum_tree, I moved to the FrequencyWeights of StatsBase. SumTree is still used to maintain and update the priorities, but for sampling it was much simpler to use StatsBase (also this is a mature package in the Julian ecosystem).

As far as I know, Frequency Sampling weights the elements identically to the current sumtree sampling. I do not know about the efficiency but I did not observe a slow down in the experiments.

This PR relies on #56 so it must be merged afterward.

codecov[bot] commented 1 year ago

Codecov Report

Merging #57 (ae42284) into main (c89ed6f) will increase coverage by 0.96%. The diff coverage is 97.91%.

@@            Coverage Diff             @@
##             main      #57      +/-   ##
==========================================
+ Coverage   73.54%   74.50%   +0.96%     
==========================================
  Files          15       15              
  Lines         756      765       +9     
==========================================
+ Hits          556      570      +14     
+ Misses        200      195       -5     
Files Changed Coverage Δ
src/traces.jl 79.44% <0.00%> (-0.45%) :arrow_down:
src/common/sum_tree.jl 81.60% <100.00%> (ø)
src/samplers.jl 84.54% <100.00%> (+5.13%) :arrow_up:

... and 1 file with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more