Open jmuchovej opened 2 months ago
@jmuchovej Yes, I agree we should add this! But, we should make it a solver constructor argument rather than a keyword argument for solve
. I suggest the name root_belief
. @WhiffleFish what do you think?
@zsunberg How would you see the solver constructor argument being handled? As far as I understand, solvers are problem-independent. So, it would have to be something like a function, the default for which would be something like root_belief = pomdp -> initialstate(pomdp)
, or?
Yeah, that seems like a good way to handle it.
Hmm... is "problem", in SARSOP's case, defined as the reward function only or the (reward function, initial belief) pair?
SARSOPSolver
is actually belief-dependent (since it defaults to the initialstate
).MyPOMDP <: POMDP
struct, then use it in POMDPs.initialstate(::MyPOMDP)
, etc.)solve
, much like you do with solve(::SARSOPSolver, ::POMDP, ...)
If the aim is to define an initial belief using the POMDPs interface – I definitely agree this would be ideal! However, it strikes me as potentially odd that either:
MyPOMDP
, or...FWIW: I'm not tied to the solve
approach I outlined! I just figured it was most consistent with the (reward function, initial belief) pair as the "problem".
When I read "solvers are problem-independent"[^1], that strikes me as: "instantiate the solver once and reuse to your heart's content". So, if I had 200 different initial beliefs and 200 different MyPOMDP
definitions, I could use the same SARSOPSolver
instance to compute all 40K policies.
[^1]: This too was my understanding of any Solver
!
In POMPDs.jl, the initial belief is part of the problem definition via initialstate
. SARSOP uses a tree to decide what parts of the belief space to explore and guarantees that the policy is within epsilon of optimal if started at the belief at the root of this tree. This root belief MAY be the initial belief of the POMDP, but it could be something else. @bkraske has recently talked to me about solving a problem he was working on at other beliefs.
@WhiffleFish 's solution should accommodate all use cases elegantly. root_belief
can be a function of the POMDP. So, by default, at the beginning of solve
, when SARSOP is introduced to the POMDP, it will call root_belief(pomdp)
and get the output of initialstate
. Therefore, the same instance of SARSOPSolver
can be used for many different POMDPs. But if the user wants to solve with a different root belief, they can override it without needing to mess with the POMDP definition.
Thanks for clarifying! 🙂 So, I see what you mean, but I don't think root_belief
should go in the solver. Adding root_belief
to the solver would make SARSOPSolver
problem dependent. As it currently is, SARSOPSolver
is problem independent. Like I understood and @WhiffleFish also understood – a Solver
should be problem independent, no?
If this is the case: perhaps you meant that root_belief
should be stored in SARSOPTree
? If so, then I totally agree! (This makes sense as otherwise it's unclear where the solver started from.)
root_belief
should be part of the solver...
e.g., in my case, I have 200 beliefs all of which are affiliated with some Random.seed!. root_belief(pomdp) would require that seed (or similar) is part of the POMDP definition – but there's no need for the seed otherwise in the POMDP.
I think you will be able to do this:
for _ in 1:200
current_belief = gen_belief()
solver = SARSOPSolver(root_belief = current_belief)
solve(solver, pomdp)
end
Or, if you wanted to avoid reconstructing the solver, you could "capture" the belief variable like this:
current_belief = nothing
solver = SARSOPSolver(root_belief = pomdp -> current_belief)
for _ in 1:200
current_belief = gen_belief()
solve(solver, pomdp)
end
Does that make more sense? (make sure you understand what the ->
is doing)
Yep! I know about this – this is what I was referring to with "non-local variables" (option 1 in my "making root_belief
work in all cases").
The concept of root_belief
is not a property of the problem---it is a detail of the solver. After all, some solvers do not involve the concept of a root belief and do not involve any kind of tree search at all. Of course, depending on the solver, you may want to give it some hints on how to better solve a particular problem instance (e.g., a different root distribution, different parameters that affect branching, different number of iterations).
@mykelk I see. Thanks for elaborating.
@zsunberg @WhiffleFish I can work on this over the coming week, will update #21 once I've done so.
Do we want tests for this? I suspect so, but I suspect things like the Tiger and Baby POMDPs will be trivial – I was thinking of using RockSample and possible LaserTag? I'm not too familiar with the LaserTag POMDP though.
Thanks, @jmuchovej ! Yes, we should have tests for it. I think Baby and Tiger tests will actually be sufficient. As long as the tests exercise all of the lines of code in the project, it is not that important how large the problems are.
Currently,
_vectorized_initialstate
takes thepomdp
andordered_states
then uses theinitialstate(::POMDP)
to initializeb0
; however, SARSOP is able to begin from any point in the belief-space, so it makes sense to allow users to provide an initial belief (perhaps in sparse form?) from which to commence SARSOP.I'd be happy to write/submit a PR for this, but wanted to open an issue to see if it was something y'all would be willing to accept. 🙂
Changes that I think that would be necessary to support this. I could probably tackle this in the next week or two, if an acceptable PR.
```diff # solver.jl + function POMDPTools.solve_info(solver::SARSOPSolver, pomdp::POMDP; b0=initialstate(pomdp)) + tree = SARSOPTree(solver, pomdp; b0) - function POMDPTools.solve_info(solver::SARSOPSolver, pomdp::POMDP) - tree = SARSOPTree(Solver, pomdp) # the rest of the code ... return pol, (; ...) end + function POMDPs.solve(solver::SARSOPSolver, pomdp::POMDP; b0=initialstate(pomdp)) = - function POMDPs.solve(solver::SARSOPSolver, pomdp::POMDP) = fist(solve_info(solver, pomdp; b0)) # tree.jl + function SARSOPTree(solver, pomdp::POMDP; b0=initialstate(pomdp)) - function SARSOPTree(solver, pomdp::POMDP) + sparse_pomdp = ModifiedSparseTabular(pomdp, b0) - sparse_pomdp = ModifiedSparseTabular(pomdp) # the rest of the codebase ... return insert_root!(...) end # sparse_tabular.jl + function ModifiedSparseTabular(pomdp::POMDP, b0) - function ModifiedSparseTabular(pomdp::POMDP) S = ordered_states(pomdp) # the rest of the codebase ... + b0 = _vectorized_initialstate(pomdp, S, b0) - b0 = _vectorized_initialstate(pomdp, S) return ModifiedSparseTabular(T, R, O, terminal, b0, discount(pomdp)) end + function _vectorized_initialstate(pomdp, S, b0) + function _vectorized_initialstate(pomdp, S) b0_vec = Vector{Float64}(undef, length(S)) @inbounds for i ∈ eachindex(S, b0_vec) b0_vec[i] = pdf(b0, S[i]) end return sparse(b0_vec) end ``` So long as `b0` is guaranteed to be compatible with `pdf(b0, S[i])`, these are all the changes necessary. Otherwise, there would need to be some handling to ensure that `b0` is either compatible with vectorizing or allow folks to specify how to achieve the equivalent of `pdf(b0, S[i])`. --- Perhaps if there were functions in `POMDPTools` that support this kinda interface, that could be interesting but is way out of scope for this issue/PR. (e.g. going from marginal beliefs to relevant `SparseCat` and the like.)