`action` interface of exploration policies

JuliaPOMDP / POMDPs.jl

MDPs and POMDPs in Julia - An interface for defining, solving, and simulating fully and partially observable Markov decision processes on discrete and continuous spaces.

http://juliapomdp.github.io/POMDPs.jl/latest/

Other

664 stars 100 forks source link

`action` interface of exploration policies #497

Open johannes-fischer opened 1 year ago

johannes-fischer commented 1 year ago

The exploration policies (https://github.com/JuliaPOMDP/POMDPs.jl/blob/master/lib/POMDPTools/src/Policies/exploration_policies.jl) do not meet the action interface described in the documentation action(::Policy, x) and cannot be used with the simulators directly. Instead they have the interface action(p::EpsGreedyPolicy, on_policy::Policy, k, s).

I was wondering if there is a reason for this?

zsunberg commented 1 year ago

I don't remember the details, but they are designed to change as the total number of calls (k) increases. i.e. to decay. I think they are used in things like tabular td learning.

(Since they are Policys they should probably also have the action(p, s) function, though it's not immediately obvious how to do that for them.)

I'm definitely open to changing the design.

johannes-fischer commented 1 year ago

I think they would need to store k and the policy. They could have an update! function for k and the policy. The policy field could be P where P<:Union{Nothing,Policy} is a template parameter (nothing to use the current action interface).