Open johannes-fischer opened 1 year ago
I don't remember the details, but they are designed to change as the total number of calls (k) increases. i.e. to decay. I think they are used in things like tabular td learning.
(Since they are Policy
s they should probably also have the action(p, s)
function, though it's not immediately obvious how to do that for them.)
I'm definitely open to changing the design.
I think they would need to store k
and the policy
. They could have an update!
function for k
and the policy
. The policy
field could be P
where P<:Union{Nothing,Policy}
is a template parameter (nothing
to use the current action
interface).
The exploration policies (https://github.com/JuliaPOMDP/POMDPs.jl/blob/master/lib/POMDPTools/src/Policies/exploration_policies.jl) do not meet the
action
interface described in the documentationaction(::Policy, x)
and cannot be used with the simulators directly. Instead they have the interfaceaction(p::EpsGreedyPolicy, on_policy::Policy, k, s)
.I was wondering if there is a reason for this?