Closed gautams3 closed 4 years ago
You are correct that there is a difference, however I would argue that it is inconsequential. In the algorithm presented in the paper, that first rollout is just thrown away, correct?
P.S. this code was written in the early days of julia and I think could be made much cleaner.
I looked at the algorithm again. You're right, the very first rollout is not used for any value update.
Ref: BasicPOMCP/solver.jl::simulate()
If you start at the root of a POMCP tree, you cannot directly jump to a rollout. You have to do selection based on upper-confidence bound (UCB) at least once before calling a rollout. The POMCP paper doesn't require that, as per
Algorithm 1
.