JuliaReinforcementLearning / ReinforcementLearningTrajectories.jl

A generalized experience replay buffer for reinforcement learning
MIT License
8 stars 8 forks source link

Sumtree sampling #60

Closed CasBex closed 1 year ago

CasBex commented 1 year ago

For all the lofty talk about numerical rounding errors in #59, they are unavoidable even with the improved method. This fix simply checks whether the sampled priority happens to be zero, and if so it walks backwards over the leafs until it finds a nonzero priority node. If the backwards walk has not found anything, it performs a forward walk instead.

This has been tested against the JuliaRL_PrioritizedDQN_CartPole experiment in ReinforcementLearningExperiments.jl with 30 different seeds.

jeremiahpslewis commented 1 year ago

Looks good! Can you add a test to this?

CasBex commented 1 year ago

Tests have been added. Feel free to change the tolerances/number of iterations... in case they take too long though. The first test checks that priority zero is never sampled; the second test checks that the pdf of samples is what we would expect. The latter however requires many samples so I've added some multithreading to speed it up. Both tests are run with 100 different seeds for the rng.

CasBex commented 1 year ago

Sorry for the confusion with the tests. Should be good to go now

codecov[bot] commented 1 year ago

Codecov Report

Merging #60 (3f3e99f) into main (85de617) will increase coverage by 0.32%. The diff coverage is 53.33%.

@@            Coverage Diff             @@
##             main      #60      +/-   ##
==========================================
+ Coverage   73.21%   73.54%   +0.32%     
==========================================
  Files          15       15              
  Lines         743      756      +13     
==========================================
+ Hits          544      556      +12     
- Misses        199      200       +1     
Files Changed Coverage Δ
src/common/sum_tree.jl 81.60% <53.33%> (+1.87%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

HenriDeh commented 1 year ago

@CasBex I'll let you merge in case you want to make a last minute change.

CasBex commented 1 year ago

I don't have permissions to merge @HenriDeh. Could you merge?