Closed HenriDeh closed 2 years ago
In #7 I mention that this may pose problem for NStepSampler. With this implementation, NStepSampler will not work properly. I have two propositions to deal with that:
FTSeries
wrapper to multiply rewards by (1-gamma^n)/(1-gamma)
before fitting. The problem with this solution is that if a Trajectory has multiple samplers (remember we discussed the metasampler), say a NStepBatchSampler
and a BatchSampler
, it is now the latter that will be wrong. So this is not my favourite option.NStepBatchSampler
. To me, NStepBatchSampler should sample N consecutive experiences of all traces it is asked for, and that's it. Doing the discounted sum of rewards should be done by the algorithm, for example in q_targets
.Indeed, NStepBatchSampler
should be changed in the next version.
I think I reached something close to a final API. Before merging, I still need to check how this fares with ElasticArrays and Episodes. But I need to learn how they work first.
Merging #12 (9e26766) into main (9095619) will decrease coverage by
1.10%
. The diff coverage is60.49%
.
@@ Coverage Diff @@
## main #12 +/- ##
==========================================
- Coverage 68.36% 67.25% -1.11%
==========================================
Files 9 10 +1
Lines 373 452 +79
==========================================
+ Hits 255 304 +49
- Misses 118 148 +30
Impacted Files | Coverage Δ | |
---|---|---|
src/rendering.jl | 0.00% <0.00%> (ø) |
|
src/traces.jl | 85.23% <0.00%> (ø) |
|
src/normalization.jl | 61.53% <61.53%> (ø) |
|
src/samplers.jl | 86.66% <100.00%> (ø) |
|
src/trajectory.jl | 74.07% <0.00%> (+1.85%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 9095619...9e26766. Read the comment docs.
Great!
I think after merging this, we'd better register this package first? So that RLCore and RLZoo could be updated to use this package.
(Kind of busy recently, but should have plenty of time to work on it in the following several weeks)
It's not an essential part of the package anyways. If you want to use Trajectories sooner it's fine. What I'm more concerned about is for the algorithms that need the meta sampler.
So I eventually went with a NormalizedTraces
(with a s) wrapper. The reason is that otherwise we cannot use it with preconstructed traces such as CircularSARTTraces
. I suspect this would have been problematic with Episodes
too.
I removed fetch
to settle with a simple overload of sample(::BatchSampler)
. Note that if we add a multistep batch sampler, we will have to implement a specific method for normalization.
One question that we might want to address: currently, the two names of MultiplexTrace
share the same normalizer. Meaning that pushing a next_state
will update the state
normalizer and sampled next_state
's will be normalized (as we want).
We must keep that in mind to avoid estimation errors. Currently in RL.jl, pushing a next_state
is not a thing so it should be ok.
The other failure case is when pushing dummy states, this should be avoided when using normalization. Since this is already a problem for non-episodic environments (as discussed in #621).
This may be ready to merge. Though I have not tested if it works well with Episodes.
Great!
We can merge this first and then address some other corner cases with Episodes later.
Hello,
I made the Normalizer that I described in #7. It is fairly straightforward to use but I think documentation will be welcome once this package is integrated to RL.jl. I'm opening an issue as a reminder to do that in due time. Tell me what you think.
Closes #7