Support for custom initial belief (e.g., solving many POMDPs with varying beliefs)

jmuchovej commented 2 months ago

Currently, _vectorized_initialstate takes the pomdp and ordered_states then uses the initialstate(::POMDP) to initialize b0; however, SARSOP is able to begin from any point in the belief-space, so it makes sense to allow users to provide an initial belief (perhaps in sparse form?) from which to commence SARSOP.

I'd be happy to write/submit a PR for this, but wanted to open an issue to see if it was something y'all would be willing to accept. 🙂

Changes that I think that would be necessary to support this. I could probably tackle this in the next week or two, if an acceptable PR.

```diff # solver.jl + function POMDPTools.solve_info(solver::SARSOPSolver, pomdp::POMDP; b0=initialstate(pomdp)) + tree = SARSOPTree(solver, pomdp; b0) - function POMDPTools.solve_info(solver::SARSOPSolver, pomdp::POMDP) - tree = SARSOPTree(Solver, pomdp) # the rest of the code ... return pol, (; ...) end + function POMDPs.solve(solver::SARSOPSolver, pomdp::POMDP; b0=initialstate(pomdp)) = - function POMDPs.solve(solver::SARSOPSolver, pomdp::POMDP) = fist(solve_info(solver, pomdp; b0)) # tree.jl + function SARSOPTree(solver, pomdp::POMDP; b0=initialstate(pomdp)) - function SARSOPTree(solver, pomdp::POMDP) + sparse_pomdp = ModifiedSparseTabular(pomdp, b0) - sparse_pomdp = ModifiedSparseTabular(pomdp) # the rest of the codebase ... return insert_root!(...) end # sparse_tabular.jl + function ModifiedSparseTabular(pomdp::POMDP, b0) - function ModifiedSparseTabular(pomdp::POMDP) S = ordered_states(pomdp) # the rest of the codebase ... + b0 = _vectorized_initialstate(pomdp, S, b0) - b0 = _vectorized_initialstate(pomdp, S) return ModifiedSparseTabular(T, R, O, terminal, b0, discount(pomdp)) end + function _vectorized_initialstate(pomdp, S, b0) + function _vectorized_initialstate(pomdp, S) b0_vec = Vector{Float64}(undef, length(S)) @inbounds for i ∈ eachindex(S, b0_vec) b0_vec[i] = pdf(b0, S[i]) end return sparse(b0_vec) end ``` So long as `b0` is guaranteed to be compatible with `pdf(b0, S[i])`, these are all the changes necessary. Otherwise, there would need to be some handling to ensure that `b0` is either compatible with vectorizing or allow folks to specify how to achieve the equivalent of `pdf(b0, S[i])`. --- Perhaps if there were functions in `POMDPTools` that support this kinda interface, that could be interesting but is way out of scope for this issue/PR. (e.g. going from marginal beliefs to relevant `SparseCat` and the like.)

zsunberg commented 2 months ago

@jmuchovej Yes, I agree we should add this! But, we should make it a solver constructor argument rather than a keyword argument for solve. I suggest the name root_belief. @WhiffleFish what do you think?

WhiffleFish commented 2 months ago

@zsunberg How would you see the solver constructor argument being handled? As far as I understand, solvers are problem-independent. So, it would have to be something like a function, the default for which would be something like root_belief = pomdp -> initialstate(pomdp), or?

zsunberg commented 2 months ago

Yeah, that seems like a good way to handle it.

jmuchovej commented 2 months ago

Hmm... is "problem", in SARSOP's case, defined as the reward function only or the (reward function, initial belief) pair?

If only the reward function (as it is now), then you could argue that SARSOPSolver is actually belief-dependent (since it defaults to the initialstate).
If the pair, then I see two clear routes forward:
1. Encode the initial belief into the incoming POMDP's struct (have a non-uniform belief? encode it in the MyPOMDP <: POMDP struct, then use it in POMDPs.initialstate(::MyPOMDP), etc.)
2. Consume the initial belief in solve, much like you do with solve(::SARSOPSolver, ::POMDP, ...)

If the aim is to define an initial belief using the POMDPs interface – I definitely agree this would be ideal! However, it strikes me as potentially odd that either:

The initial belief must be part of MyPOMDP, or...
The initial belief must be non-reproducibly sampled (i.e., it'll either rely on global variables, or worse require reinstantiation anytime the initial belief changes).

FWIW: I'm not tied to the solve approach I outlined! I just figured it was most consistent with the (reward function, initial belief) pair as the "problem".

When I read "solvers are problem-independent"[^1], that strikes me as: "instantiate the solver once and reuse to your heart's content". So, if I had 200 different initial beliefs and 200 different MyPOMDP definitions, I could use the same SARSOPSolver instance to compute all 40K policies.

[^1]: This too was my understanding of any Solver!

zsunberg commented 2 months ago

In POMPDs.jl, the initial belief is part of the problem definition via initialstate. SARSOP uses a tree to decide what parts of the belief space to explore and guarantees that the policy is within epsilon of optimal if started at the belief at the root of this tree. This root belief MAY be the initial belief of the POMDP, but it could be something else. @bkraske has recently talked to me about solving a problem he was working on at other beliefs.

@WhiffleFish 's solution should accommodate all use cases elegantly. root_belief can be a function of the POMDP. So, by default, at the beginning of solve, when SARSOP is introduced to the POMDP, it will call root_belief(pomdp) and get the output of initialstate. Therefore, the same instance of SARSOPSolver can be used for many different POMDPs. But if the user wants to solve with a different root belief, they can override it without needing to mess with the POMDP definition.

jmuchovej commented 2 months ago

Thanks for clarifying! 🙂 So, I see what you mean, but I don't think root_belief should go in the solver. Adding root_belief to the solver would make SARSOPSolver problem dependent. As it currently is, SARSOPSolver is problem independent. Like I understood and @WhiffleFish also understood – a Solver should be problem independent, no?

If this is the case: perhaps you meant that root_belief should be stored in SARSOPTree? If so, then I totally agree! (This makes sense as otherwise it's unclear where the solver started from.)

If you actually meant that root_belief should be part of the solver...

I agree that `root_belief(pomdp)` could accommodate **many** use cases. When I say "problem dependent", I mean that I couldn't use the same `SARSOPSolver` instance across 15 different problems, unless they had exactly the same initial beliefs (or beliefs were unreproducibly sampled) – even though I can change the rest of the problem (by creating another instance of `MyPOMDP`) otherwise. The only way `root_belief(pomdp)` could accommodate all use cases is if: 1. Users refer to non-local variables. 2. `MyPOMDP` contains all relevant features for sampling/specifying alternate root beliefs. e.g., in my case, I have 200 beliefs all of which are affiliated with some `Random.seed!`. `root_belief(pomdp)` would require that `seed` (or similar) is part of the POMDP definition – but there's no need for the seed otherwise in the POMDP. I'm not sure if (1) is a deal-breaker – personally I think it is, as referencing non-local variables make it hard to reason about code execution. Another possibility (in my case) is to `Random.seed!` right before `SARSOPSolver`, but at that point – I'm using a new `SARSOPSolver` for every belief-reward pair. For `root_belief(pomdp)` to truly accommodate all cases, anything relevant to sampling a different `root_belief` without using non-local variables demands that the relevant information is stored in the `POMDP`. If someone wants to use an existing POMDP, say `RockSamplePOMDP`, they would have to engage in (1) or full reimplement it to support (2). _Perhaps there's something I'm overlooking. If so, I'd be happy to hear about it 'cause right now integrating `root_belief` into `SARSOPSolver` wouldn't actually resolve what prompted me to open this issue. 😅_ Essentially, I was looking for a way to manually enforce the SARSOP begin from a particular root belief, versus something like starting from the uniform belief or an arbitrarily sampled one.

zsunberg commented 2 months ago

e.g., in my case, I have 200 beliefs all of which are affiliated with some Random.seed!. root_belief(pomdp) would require that seed (or similar) is part of the POMDP definition – but there's no need for the seed otherwise in the POMDP.

I think you will be able to do this:

for _ in 1:200
    current_belief = gen_belief()
    solver = SARSOPSolver(root_belief = current_belief)
    solve(solver, pomdp)
end

Or, if you wanted to avoid reconstructing the solver, you could "capture" the belief variable like this:

current_belief = nothing
solver = SARSOPSolver(root_belief = pomdp -> current_belief)
for _ in 1:200
    current_belief = gen_belief()
    solve(solver, pomdp)
end

Does that make more sense? (make sure you understand what the -> is doing)

jmuchovej commented 2 months ago

Yep! I know about this – this is what I was referring to with "non-local variables" (option 1 in my "making root_belief work in all cases").

mykelk commented 2 months ago

The concept of root_belief is not a property of the problem---it is a detail of the solver. After all, some solvers do not involve the concept of a root belief and do not involve any kind of tree search at all. Of course, depending on the solver, you may want to give it some hints on how to better solve a particular problem instance (e.g., a different root distribution, different parameters that affect branching, different number of iterations).

jmuchovej commented 2 months ago

@mykelk I see. Thanks for elaborating.

@zsunberg @WhiffleFish I can work on this over the coming week, will update #21 once I've done so.

Do we want tests for this? I suspect so, but I suspect things like the Tiger and Baby POMDPs will be trivial – I was thinking of using RockSample and possible LaserTag? I'm not too familiar with the LaserTag POMDP though.

zsunberg commented 2 months ago

Thanks, @jmuchovej ! Yes, we should have tests for it. I think Baby and Tiger tests will actually be sufficient. As long as the tests exercise all of the lines of code in the project, it is not that important how large the problems are.

JuliaPOMDP / NativeSARSOP.jl

Support for custom initial belief (e.g., solving many POMDPs with varying beliefs) #20