Closed zsunberg closed 4 years ago
Hi @zsunberg, thanks for the suggestion! I am not sure if it is possible to use reward(m, s, a, sp, o)
in this case. Looking at the original paper for the algorithm (http://www.fore.robot.cc/papers/Pineau03a.pdf), the backup equation is
So it looks like the reward function cannot depend on the next state or the observation. Am I overlooking someting? I am not all that familiar with POMDP solving methods and basically just tried to stick to the paper.
For the observation function, I have now switched to the more general version.
In the latest version of POMDPModelTools, there is a way to automatically cache the results of taking expectations of reward(m, s, a, sp, o)
: https://juliapomdp.github.io/POMDPModelTools.jl/dev/model_transformations/#State-Action-Reward-Model
# somewhere at the beginning
r = StateActionReward(pomdp)
# when you need to use R(s, a) in the backups
r(s, a)
This will allow the solver to work with models that have reward(m, s, a, sp, o)
.
Great, thanks! Fixed it.
Hi @dominikstrb one potential improvement is that it is better to use versions of functions with more arguments, e.g.
reward(m, s, a, sp, o)
orreward(m, s, a, sp)
andobservation(m, s, a, sp)
instead ofreward(m, s, a)
andobservation(m, a, sp)
. This will allow compatibility with more problems.