JuliaReinforcementLearning / ReinforcementLearningTrajectories.jl

A generalized experience replay buffer for reinforcement learning
MIT License
8 stars 8 forks source link

How do we deal with metasampler ? #13

Closed HenriDeh closed 2 years ago

HenriDeh commented 2 years ago

Hi,

So I tried implementing the MetaSampler. Here's how I thought we could do it:

struct MetaSampler <: AbstractSampler
    samplers::Dict{Symbol, AbstractSampler}
end

"""
    MetaSampler(samplers::Pair{Symbol, <:AbstractSampler}...)

MetaSampler is a collection of Symbol => Sampler pairs. This is intended to be used by algorithm designers when they need different ways to sample a trajectory at various points in their implementation.
"""
MetaSampler(samplers::Pair{Symbol, <: AbstractSampler}...) = MetaSampler(Dict(samplers...))

Base.getindex(m::MetaSampler, idx) = m.samplers[idx]

The idea being that in the algorithm implementation one could use sample(trajectory.sampler[:policy], traces) to switch between samplers in the MetaSampler. However I have two questions:

  1. I don't see sample in the exported API in the readme. Does that mean that the design is to only use iterate ? sample is called by take! and take! is called by iterate. If we were to use sample directly, this would bypass the controler shenanigans, which is not desirable for ASync stuff. Perhaps we could implement take!(trajectory, :policy) (that is, the second argument is to specify a sampler. But it's weird to me to use take! as a sampling API.
  2. Anyways, how can we integrate MetaSampler to the iterate API ? I don't think iterate can accept a Symbol option.
findmyway commented 2 years ago
  1. I don't see sample in the exported API in the readme. Does that mean that the design is to only use iterate ? sample is called by take! and take! is called by iterate. If we were to use sample directly, this would bypass the controler shenanigans, which is not desirable for ASync stuff. Perhaps we could implement take!(trajectory, :policy) (that is, the second argument is to specify a sampler. But it's weird to me to use take! as a sampling API.

Yep. This is by design. From end-users' perspective, they should always use iterate to take samples from the trajectory. Only developers need to implement the sample method.


PS: we may add a special implement when controller is set to nothing. In that case, take! fallbacks to sample.

  1. Anyways, how can we integrate MetaSampler to the iterate API ? I don't think iterate can accept a Symbol option.

(ASSUMPTION)As long as the same controller applies to all the inner samplers, that should be simple to be implemented as follows:

struct MetaSampler <: AbstractSampler
    samplers::NamedTuple
end

sample(s::MetaSampler, t:Traces) = map(s -> sample(s, t), s.samplers)

So when iterate over the trajectory with such meta sampler, each sample is a NamedTuple of samples from each inner sampler.

If the above ASSUMPTION doesn't hold, then what we need is a meta trajectory instead of a meta sampler.

Let me know if you are still unsure how to implement it.

HenriDeh commented 2 years ago

I see. With the MetaSampler solution we have to accept that sometimes we will sample for no reason, if some samplers must be used more often than others. This is the case in PPG , where there are three inner optimization loops inside a "phase" outer loop.

So perhaps a MetaTrajectory is needed, one that simulates several trajectories that share the same data. Its iterate would be the equivalent of the outerloop, but then I must think of how the inner loops work.