JuliaPOMDP / POMDPs.jl

MDPs and POMDPs in Julia - An interface for defining, solving, and simulating fully and partially observable Markov decision processes on discrete and continuous spaces.
http://juliapomdp.github.io/POMDPs.jl/latest/
Other
662 stars 100 forks source link

Should there be an actions(::POMDP, ::Belief, ::AbstractSpace) method? #45

Closed zsunberg closed 8 years ago

zsunberg commented 8 years ago

Tree-based solvers like POMCP need to evaluate all (or some in the sparse or dpw case) of the actions available at a sample belief state. Right now we only have the actions(::POMDP, ::State, ::AbstractSpace) method.

By default, the method would return the full action space, so that it only needs to be implemented in cases where only certain actions are available from certain beliefs.

@pomdp_func actions(pomdp::POMDP, b::Belief, aspace::AbstractSpace=actions(pomdp)) = aspace
ebalaban commented 8 years ago

Maybe I'm missing something here, but wouldn't the set of actions available from a belief state be the union of actions sets available at each state in the belief state? Are we going to redefine that? For example, actions available in s1 are a1 and a2, actions available in s2 are a1, a3, and a4. Our belief state of interest is, say, b1=(0.05, 0.95). Are you proposing defining action sets available in b1 as, potentially, something different from {a1, a2, a3, a4}?

zsunberg commented 8 years ago

@ebalaban , I can't think of a case where the action set would differ from the union you described, but in some cases it seems to me that it would be difficult, inefficient, or impossible to implement a solver using only the current actions(::POMDP, ::State, ::AbstractSpace) method. For example, in a problem I am working on now, the Orienteering Problem with Correlated Stochastic Payoffs, the action space is dependent on the state, but the state space is also uncountable, so it cannot be iterated through to construct the action space as you described.

I think the easiest way to address all cases is to introduce the new method.

etotheipluspi commented 8 years ago

I'm ok with this, but we need to make it clear which solver uses which action function. @ebalaban would DESPOT use something like this as well?

ebalaban commented 8 years ago

Ok, that orienteering problem is a good example. I am currently using particle-based belief with DESPOT, so it's not as much of an issue there yet. The only thing that bothers me a bit though is that it's not really the belief itself that defines the action subset, but rather the state subset over which the belief is defined. So, two different beliefs with the same structure would map to the same action subset. We probably want to avoid introducing the notion of a state subset though, so I guess using belief for the purpose is fine.

zsunberg commented 8 years ago

I'm going to go ahead and put this in. It can be reopened if it turns out something isn't satisfactory