JuliaReinforcementLearning / ReinforcementLearningTrajectories.jl

A generalized experience replay buffer for reinforcement learning

MIT License

8 stars 8 forks source link

Normalizer Wrapper #12

Closed HenriDeh closed 2 years ago

HenriDeh commented 2 years ago

Hello,

I made the Normalizer that I described in #7. It is fairly straightforward to use but I think documentation will be welcome once this package is integrated to RL.jl. I'm opening an issue as a reminder to do that in due time. Tell me what you think.

Closes #7

HenriDeh commented 2 years ago

In #7 I mention that this may pose problem for NStepSampler. With this implementation, NStepSampler will not work properly. I have two propositions to deal with that:

We can make a discounted_sum_of_rewards_normalizer: by simply using a FTSeries wrapper to multiply rewards by (1-gamma^n)/(1-gamma) before fitting. The problem with this solution is that if a Trajectory has multiple samplers (remember we discussed the metasampler), say a NStepBatchSampler and a BatchSampler, it is now the latter that will be wrong. So this is not my favourite option.
What I think we should do is completely change NStepBatchSampler. To me, NStepBatchSampler should sample N consecutive experiences of all traces it is asked for, and that's it. Doing the discounted sum of rewards should be done by the algorithm, for example in q_targets.

findmyway commented 2 years ago

Indeed, NStepBatchSampler should be changed in the next version.

HenriDeh commented 2 years ago

I think I reached something close to a final API. Before merging, I still need to check how this fares with ElasticArrays and Episodes. But I need to learn how they work first.

codecov-commenter commented 2 years ago

Codecov Report

Merging #12 (9e26766) into main (9095619) will decrease coverage by 1.10%. The diff coverage is 60.49%.

@@            Coverage Diff             @@
##             main      #12      +/-   ##
==========================================
- Coverage   68.36%   67.25%   -1.11%     
==========================================
  Files           9       10       +1     
  Lines         373      452      +79     
==========================================
+ Hits          255      304      +49     
- Misses        118      148      +30

Impacted Files	Coverage Δ
src/rendering.jl	`0.00% <0.00%> (ø)`
src/traces.jl	`85.23% <0.00%> (ø)`
src/normalization.jl	`61.53% <61.53%> (ø)`
src/samplers.jl	`86.66% <100.00%> (ø)`
src/trajectory.jl	`74.07% <0.00%> (+1.85%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9095619...9e26766. Read the comment docs.

findmyway commented 2 years ago

Great!

I think after merging this, we'd better register this package first? So that RLCore and RLZoo could be updated to use this package.

(Kind of busy recently, but should have plenty of time to work on it in the following several weeks)

HenriDeh commented 2 years ago

It's not an essential part of the package anyways. If you want to use Trajectories sooner it's fine. What I'm more concerned about is for the algorithms that need the meta sampler.

HenriDeh commented 2 years ago

So I eventually went with a NormalizedTraces (with a s) wrapper. The reason is that otherwise we cannot use it with preconstructed traces such as CircularSARTTraces. I suspect this would have been problematic with Episodes too. I removed fetch to settle with a simple overload of sample(::BatchSampler). Note that if we add a multistep batch sampler, we will have to implement a specific method for normalization.

One question that we might want to address: currently, the two names of MultiplexTrace share the same normalizer. Meaning that pushing a next_state will update the state normalizer and sampled next_state's will be normalized (as we want). We must keep that in mind to avoid estimation errors. Currently in RL.jl, pushing a next_state is not a thing so it should be ok. The other failure case is when pushing dummy states, this should be avoided when using normalization. Since this is already a problem for non-episodic environments (as discussed in #621).

This may be ready to merge. Though I have not tested if it works well with Episodes.

findmyway commented 2 years ago

Great!

We can merge this first and then address some other corner cases with Episodes later.