Closed NeroBlackstone closed 1 year ago
And another solution I think better is add a Keyword Arguments for EpsGreedyPolicy POMDPs.action
function POMDPs.action(p::MyEpsGreedyPolicy, on_policy::Policy, k, s; getValidActions::Function = x -> x)
va = getValidActions(s)
if rand(p.rng) < p.eps(k)
return rand(p.rng, va)
else
return action(on_policy, s)
end
end
Then call these codes in solver:
function validActions(s::State)::Vector{Action}
......
end
exploration_policy = MyEpsGreedyPolicy(mdp,0.9)
a = action(exploration_policy, on_policy, 0, s, getValidActions = validActions)
So you don't need to construct another EpsGreedyPolicy in solver.
What do you think? Is this one better? I'm willing to contribute.
ps: action(on_policy, s)
still taken some action not include in validActions
function return, but I think this is nothing about ExplorationPolicy, it must implement in on_policy. So I do not change these codes.
I think both these two solutions are acceptable, solver programming will have more flexibility Maybe we need to add both of these solutions.
Hi @NeroBlackstone Thanks for your contribution and sorry for the delay in responding!
(PO)MDPs themselves can have state-dependent actions spaces, see https://juliapomdp.github.io/POMDPs.jl/stable/def_pomdp/#state-dep-action . So the information about which actions can be taken from which state should be stored within the problem.
So, I think it would be best to just do the following:
EpsGreedyPolicy
so that it stores the problem in field m
.action
so that it calls the actions
function with the state argument.
function POMDPs.action(p::MyEpsGreedyPolicy, on_policy::Policy, k, s)
va = actions(p.m, s) # <----- This is the important part
if rand(p.rng) < p.eps(k)
return rand(p.rng, va)
else
return action(on_policy, s)
end
end
Does that seem like a good solution?
Hi! @zsunberg , Thanks for your review, and sorry for my delayed response. I did not notice that POMDPs.jl already supports state-dep-action, your solution is really great and inspiring.
If adding field m
, we may want to remove filed actions
, since the actions can get from (po)mdp.
What about this:
struct MyEpsGreedyPolicy{T<:Function,R<:AbstractRNG} <: ExplorationPolicy
eps::T
rng::R
m::Union{MDP,POMDP}
end
function MyEpsGreedyPolicy(problem::Union{MDP,POMDP}, eps::Function;
rng::AbstractRNG=Random.GLOBAL_RNG)
return MyEpsGreedyPolicy(eps, rng, problem)
end
function MyEpsGreedyPolicy(problem::Union{MDP,POMDP}, eps::Real;
rng::AbstractRNG=Random.GLOBAL_RNG)
return MyEpsGreedyPolicy(x -> eps, rng, problem)
end
function POMDPs.action(p::MyEpsGreedyPolicy, on_policy::Policy, k, s)
if rand(p.rng) < p.eps(k)
return rand(p.rng, actions(p.m,s))
else
return action(on_policy, s)
end
end
I have updated gridworld runnable demo, and MyEpsGreedyPolicy
work well, it does not break exist solvers at all.
I still think adding the whole environment model to EpsGreedyPolicy
is a little "heavey", but I don't have any better idea.
Do you think it's a accecptable solution?
Thanks for your code review again :)
Another related issue I want to put here is about document improvement
State- or belief-dependent action spaces
actions = function (s)
if s == 1
return [1,2,3]
else
return [2,3]
end
end
In this code block, I think this example is better:
actions = function (s = nothing)
if s = 1
return [2,3] #<--- return state-dep-actions here
else
return [1,2,3] #<--- return full action spaces here
end
end
Add = nothing
because original (po)mdp definition will make actions(mdp)
throws an error.
We can't get full action space from that function.
Great! the only change I would suggest is the following
struct MyEpsGreedyPolicy{T<:Function,R<:AbstractRNG,M<:Union{MDP,POMDP}} <: ExplorationPolicy
eps::T
rng::R
m::M
end
This will result in much faster code. (if you don't understand why, feel free to ask)
I like your change to the documentation - can you put it in a separate PR?
I have opened a pr for this code change.
struct MyEpsGreedyPolicy{T<:Function,R<:AbstractRNG,M<:Union{MDP,POMDP}} <: ExplorationPolicy eps::T rng::R m::M end
This will result in much faster code. (if you don't understand why, feel free to ask)
You are so kind, could you tell me why this change will make the code faster? I'm curious.
I like your change to the documentation - can you put it in a separate PR?
Okay, I have changed the document in this pr.
This work is based on this discussion. I have finished this work, this gist is a runnable EpsGreedyPolicy demo, include a grid world mdp. When state at (1,1), (bottom left corner), EpsGreedyPolicy receive [RIGHT,UP] as actions. So agent will not hit the wall.
Add two function to EpsGreedyPolicy let it select action by current state.
If you think this is the right way, I want to open a pull request.
Thanks your code review.