Representing POMDP Internal State

rejuvyesh commented 8 years ago

@zsunberg suggested me to open a new issue to discuss PolicyState.

If you survey the literature about different methods for solving POMDPs, one unifying view is to see a POMDP policy as mapping observation history $h_{1:t}$ to an action. But since we can only have finite memory different authors have come up with different approaches to compress this history. This can be intuitively represented as an agent's internal state which can be:

Actual State
Finite window of past observations
Belief
RNN
Finite state machine

So to keep the interface general enough, defining a PolicyState type seems imperative to me.

Then we can possibly define:

abstract Belief <: PolicyState

But @zsunberg reminded that currently

abstract Belief <: AbstractDistribution

Since julia does not support multiple inheritance for abstract type, we need to decide on which one to accept.

zsunberg commented 8 years ago

For continuity, our previous discussion about this was in #34

zsunberg commented 8 years ago

Another possible name for this that @rejuvyesh suggested could be AgentState. I think I like PolicyState slightly better though.

If we do this right, I think it could help significantly in understanding how solvers work. I think I'm currently leaning towards abstract Belief <: PolicyState, because I think POMDPs.jl's forte is going to be its ability to support complex solvers that work with generative models and don't use belief as their state. It's easy to explain to someone that a belief is a distribution; it's harder to explain that these advanced solvers and policies work using something that is not a belief, but is similar.

mykelk commented 8 years ago

I think I remember discussing this a while ago in my office. I thought that Belief could be any of those things that @rejuvyesh mentioned. I think it is an error that Belief is a type of AbstractDistribution. After all, a Belief (as an AbstractDistribution) doesn't make sense in the context of MCVI.

zsunberg commented 8 years ago

Yes, @mykelk , the conclusion that we came to last time was that we should just call this thing Belief. But, I think this made it rather difficult for @rejuvyesh to understand our interface as someone new to it, and that will probably happen again, so we should consider ways to make it more clear. Having been around the interface for so long, it is difficult for me to tell what is confusing and what is not.

If belief is not a subtype of AbstractDistribution, I suppose that it would prompt people to think "if Belief is not always a distribution, what is it?". Perhaps proper documentation of Belief as it is now will alleviate the problem.

rejuvyesh commented 8 years ago

Yes. I think calling it Belief is slightly confusing because the term Belief has a specific meaning in POMDP and using it to denote the general term representing the internal state of the agent is counter-intuitive.

zsunberg commented 8 years ago

It would just be so nice if we could do

abstract PolicyState
abstract AbstractDistribution{T}
abstract Belief{S} <: PolicyState, AbstractDistribution{S}

mykelk commented 8 years ago

Belief is just a statistic for representing a distribution over the current state---and histories, states in a finite state controller, particles, histograms, etc. can be these statistics. Although, in theory, you can sample from or evaluate the density of the distribution specified by these statistics, you don't necessarily need that capability to solve POMDPs or represent their policies. Does that make sense?

rejuvyesh commented 8 years ago

@mykelk I think I understand what you are trying to convey. But I still feel Belief is the wrong term for the idea.

mykelk commented 8 years ago

If there is consensus for PolicyState, let's go with that.

zsunberg commented 8 years ago

wait

zsunberg commented 8 years ago

I am doing some reading to see how "belief" is used in the literature, and my initial findings are that it is actually used to denote something which may or may not be a probability distribution

zsunberg commented 8 years ago

I apologize for flip-flopping on this. I am doing that because I really actually don't know what the best solution is.

By the way, while it is important to make a decision on this as soon as possible, this does not prevent us from pushing the parametric interface to master (which will also come with a better documentation framework)

zsunberg commented 8 years ago

An interesting discussion of the term "belief state" can be found in section 3.2 of Kaelbling's 1998 "Planning and acting in partially observable stochastic domains". One sentence from that section reads "Our choice for belief states will be probability distributions over states of the world." indicating that they view a probability distribution as only one possible choice of "belief state".

Perhaps abstract BeliefState would better communicate that the type should include things that may not be probability distributions.

zsunberg commented 8 years ago

Moreover, in Mykel's dmu book (which this software will often be used with), "belief state" is introduced on page 115, and the first example given is win-loss counts for a multi-armed bandit, which is not a probability distribution.

zsunberg commented 8 years ago

The MCVI paper (Bai, H., Hsu, D., Lee, W., & Ngo, V. (2011). Monte Carlo value iteration for continuous-state POMDPs. Algorithmic Foundations of Robotics IX, 175–191. Retrieved from http://link.springer.com/chapter/10.1007/978-3-642-17452-0_11) does seem to use the term "belief" to denote a probability distribution over states, but it does say "It encodes the belief implicitly in the controller state based on the robot’s initial belief b and the sequence of observations received."

I think BeliefState is a good type name for something that "implicitly encodes the belief"

zsunberg commented 8 years ago

David Silver defines a "belief state" as a probability distribution. He says "The belief state is the probability distribution over states given history h" in the POMCP paper. (Silver, D., & Veness, J. (2010). Monte-Carlo Planning in Large POMDPs. In Advances in neural information processing systems (pp. 2164–2172). Retrieved from http://discovery.ucl.ac.uk/1347369/)

So that disagrees with naming our policy state type BeliefState.

zsunberg commented 8 years ago

Another possible name could be abstract InformationState

cho3 commented 8 years ago

To add more data points, I've seen: InternalMemory, and InternalState (via reinforcement learning state of the art)

rejuvyesh commented 8 years ago

Naming things is one of the 3 hardest problems in CS :cry:

zsunberg commented 8 years ago

From the DESPOT paper (Somani, A., Ye, N., Hsu, D., & Lee, W. (2013). DESPOT : Online POMDP Planning with Regularization. Advances in Neural Information Processing Systems, 1–9. Retrieved from http://papers.nips.cc/paper/5189-despot-online-pomdp-planning-with-regularization): "beliefs, which are probability distributions over the states" "The agent maintains a belief, often represented as a probability distribution over S."

I definitely don't think we should just use Belief, that term is too closely associated with a probability distribution.

goretkin commented 8 years ago

Sorry to jump in the middle. Hopefully this isn't just noise.

Suppose there's a POMDP with beautifully intricate observation model. The state transitions are not controllable. Whatever the action you take (let's say it's an enumerable action space with one action), the transition dynamics are the same.

Whatever the reward structure is, you can implement an optimal controller for this POMDP that basically keeps no state. If "belief" means "what you need to keep track of in order to act optimally", then for this POMDP the belief is "nothing". I can still run a state estimator that keeps track of a distribution over states, and so I feel that there is still a belief that isn't just "nothing". It's just that my controller/policy for the POMDP doesn't need to. So the state of this finite state controller (which again, is the trivial one-state-always-do-the-same-thing controller) is "sufficient" for me to know know the distribution over optimal actions. But it is not sufficient for me to know the probability that I receive some observation on the next time step. For that I need the distribution over states.

In the case that the controller is decoupled into "estimator" and "action selector given state distribution", then there's no question as to what the belief is, right? But for other control architectures (like a FSM), I think there are two different distributions/models/?? that a quantity (Belief, InternalState, etc) can be a sufficient statistic with respect to: the action distribution and the observation distribution.

zsunberg commented 8 years ago

From the SARSOP paper (Kurniawati, H., Hsu, D., & Lee, W. (2008). SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces. Robotics: Science and Systems. Retrieved from https://www1.comp.nus.edu.sg/~leews/publications/rss08.pdf): "A belief is a probability distribution over all possible robot states, and the set of all beliefs form the belief space."

We definitely shouldn't use just Beliefto denote all things that the policy can make its decision based on.

zsunberg commented 8 years ago

@goretkin, thanks for the comment. I think that in some cases it is indeed useful to think of this object as "everything needed to make a good decision" rather than "everything needed to encode belief"

mykelk commented 8 years ago

Thanks guys for looking into this. @zsunberg, do you like BeliefState?

zsunberg commented 8 years ago

I am ok with BeiliefState, but it seems that more consensus is actually building around PolicyState.

I still see two issues with this:

I have not encountered "policy state" anywhere in the literature (perhaps this is because no-one else has tried to create a software framework that accommodates such a wide variety of solution approaches, so we have to coin our own terms).
What do we do with Belief? If people look at a POMDP package and don't find belief, they will be confused. We could keep abstract Belief <: PolicyState, but I have found that having extra types floating around is usually a source of pain. Furthermore, Belief should really be an AbstractDistribution, and it is impossible to have it be a subtype of both. A possibly better approach would be to break it out into another package that could contain a suite of beliefs and updaters (to start we could just put it in POMDPToolbox). In the other package, we could have something like

type BeliefState{S} <: PolicyState
    dist::AbstractDistribution{S}
end

rand(..., b::BeliefState , ...) = rand(..., b.dist, ...)
pdf(..., b::BeliefState , ...) = pdf(..., b.dist, ...)
# etc

zsunberg commented 8 years ago

At the meeting today we took a quick poll between BeliefState, PolicyState, and AgentState. AgentState got 3 votes; PolicyState' got two.BeliefState` got none.

I talked to Ed afterwards, and he said that he would be ok with BeliefState, but was not ready to change it to one of the other two.

There are really valid arguments on both sides - I guess we'll probably just have to take a final vote on this early next week when Mykel's back. I don't think it actually slows anyone down to delay until then.

mykelk commented 8 years ago

I vote against AgentState. The concept of agent doesn't show up anywhere in the code, but Policy does. So, I would be in much more favor of PolicyState.

zsunberg commented 8 years ago

btw, right now I'm working on a branch that has Belief not a subtype of AbstractDistribution, and it is kind of a pain in practice

mykelk commented 8 years ago

@zsunberg can you say a few words about this pain so that we know what sorts of things to look out for in the design of this API?

zsunberg commented 8 years ago

For example in the crying baby problem, I had to define both BoolDistribution <: AbstractDistribution{Bool} and ExactBabyBelief <: Belief which are exactly the same thing. In POMCP, I am having to use Union{Belief, AbstractDistribution} in some places

zsunberg commented 8 years ago

What if we pull another parametric trick, i.e. abstract Policy{T} abstract Updater{T} where T is the policy state or belief? Just throwing ideas out there.

mykelk commented 8 years ago

I'm okay with this if the others are.

zsunberg commented 8 years ago

Notes about this from meeting with Matthijs:

He immediately recognized and understood this naming issue when we mentioned it and said that the question of what to call it is a "good question". So that's reassuring that it's easy for others to understand that there is a wider set of things that policies can feed back on than just beliefs.
He suggested that since not many others have attempted to accommodate such a wide range of solvers, it may be appropriate for us to come up with a new term for it.
He didn't think belief was a good name because he would be confused when he found a belief that was not a probability distribution
He didn't really like policy state either
He didn't really like agent state
He said his vote would be for InformationState, which I actually like quite a bit (actually, I realized that I have used it as a synonym for belief state in a paper before coming to Stanford)

zsunberg commented 8 years ago

btw, InformationState does appear to have a meaning in game theory (e.g. http://onlinelibrary.wiley.com/doi/10.1111/1467-8586.00134/pdf). It seems to be the set of all states that the decision making agent considers to be possible.

ebalaban commented 8 years ago

InformationState is something that occurred to me as well as I was driving back yesterday, but I thought it was still too vague and ambiguous. How about something like GeneralizedBelief to indicate that we are not talking about the most common understanding of belief? Then Belief (a distribution) or TreeNode or History could be subtypes of it. I am also ok with just sticking with Belief (probably still my first choice) or going with BeliefState, as I mentioned to @zsunberg yesterday.

mykelk commented 8 years ago

@zsunberg Out of curiosity, what was the objection with PolicyState?

Also, "He didn't think belief was a good name because he would be confused when he found a belief that was not a probability transition"---do you mean probability distribution?

I don't know if we should be hung up about explicit representation of probability distributions. A belief is just a piece of information that is sufficient for the policy to select its action (e.g., counts of wins and losses in a bandit problem). Just like how a state in an MDP is just a piece of information that is sufficient for predicting the next state and reward. Maybe we can have Belief and BeliefDistribution. Although I'm not completely against BeliefState, it is nice having Belief since it is a single word (like most of our other core concepts) and we don't have to kinds of states---belief states and environment states. I kind of like just dealing with actions, states, observations, beliefs, and policies. We can provide nice crisp definitions of these five core concepts and just stay something like:

A belief is updated based on the action and observation.
An action is generated (potentially according to some distribution) from a policy based on the belief. That's it! This definition captures heuristic methods, finite state machines, history based methods, parametric distribution methods, non-parametric particle based methods, predictive state representation methods, etc.

ebalaban commented 8 years ago

I agree with @mykelk.

mykelk commented 8 years ago

Not all beliefs will be distributions, but the ones that are could look like type MyBelief <: Belief, AbstractDistribution{MyState}. Solvers that need the belief to be a distribution, would just use it as a distribution---and it would fail out if it isn't provided (just like other parts of the API).

ebalaban commented 8 years ago

If anything, I think this is our chance to expand the "conventional" understanding of what a belief is and get the idea that other representations are just as valid a little more out there.

tawheeler commented 8 years ago

We discussed that there are essentially two routes we can take:

use Belief to mean a probability distribution and come up with an abstract term to capture the information necessary to choose the next state and action
use Belief to mean the information necessary to choose the next state and action, and then use some term to refer to a Belief in the form of a distribution

(1) would probably require defining a new concept (2) would not, and we wouldn't even have that hard of a time explaining it in the docs. It matches DMU as well. If I understand correctly, the con is that most of the literature refers to beliefs in the form of distributions.

zsunberg commented 8 years ago

@mykelk , yes, I meant "distribution".

If we could do what you suggested above with multiple inheritance, this would be so easy to resolve! Unfortunately Julia doesn't permit multiple inheritance :(

Matthijs' response to PolicyState was "well, it's not really the policy state...". I interpreted this to mean something like the following: in the expression a = pi_s(b), the policy state would be s, i.e. some set of parameters that define the function pi, not the argument to the function. Does that make sense?

As Tim said, the problem with belief is that, in the literature, "belief" nearly universally refers to a probability distribution.(*) Furthermore, the two people from outside the lab that I have discussed this with (Jayesh and Matthijs), have both said that they didn't think belief was the right term. If we use Belief, as far as I can tell, we will not be adhering to the widely accepted definition. We have to decide if that is a good thing to do.

(*)@mykelk, when you said "A belief is just a piece of information that is sufficient for the policy to select its action" did you mean that the POMDP research community defines belief in this way? I am totally open to evidence that they do.

zsunberg commented 8 years ago

We either need to gather more information about what the community expects (e.g. Matthijs mentioned that one of his colleagues might have a strong opinion), or just take a vote on it somehow.

^^^ IF ONLY WE KNEW THE PROBABILITY DISTRIBUTIONS OF THIS, IT WOULD BE SUCH A GOOD POMDP

zsunberg commented 8 years ago

I don't think InformationState is necessarily too vague - it is any representation of all of the information the agent has collected by receiving observations. Unlike "belief", "information" does not imply that the agent has synthesized the data from the observations into levels of confidence for each state.

mykelk commented 8 years ago

Oh, yeah. Multiple inheritance was Julia issue 5. Apparently traits helps with this, but I haven't dug into how that works in Julia yet.

mykelk commented 8 years ago

Using belief to mean the information sufficient to select an action is not inconsistent with the literature. We know that a distribution over states is sufficient to select the optimal action in a POMDP. For small, discrete problems, we can represent this distribution exactly using an array of probabilities. Hence, the idea of belief vectors often shows up in the POMDP literature. However, for continuous state problems, we often have to be content with inexact representations using memories (e.g., see McCallum's work), or particles, or mixtures of Gaussians, or some finite state representation. These can all be used to represent the agent's belief about the world.

BTW, belief shows up in other architectures with similar meanings. For example, in the Belief-Desire-Intention architecture, "Beliefs represent the informational state of the agent, in other words its beliefs about the world (including itself and other agents)."

I think it is fine to use belief in this general context, even though the general meaning isn't often provided in POMDP papers. People first understand multiplication in the way they learned it in elementary school, but they later understand multiplication as a more general operator in a group theoretic context that satisfies some basic properties.

zsunberg commented 8 years ago

Ok, I think, unless we are going to reach out to anyone else in the community, the arguments for both approaches have been pretty fully articulated. The question of whether it is better to coin a new term or expand the definition of "belief" is a judgment call. We'll just have to pick one and see how it goes. What's the best way to decide?

We'll also eventually have to deal with the separate issue of belief distributions not being able to inherit from both this and AbstractDistribution. Using Policy{B} and Updater{B} is possibly a good way to deal with this, but we would still have to decide what to call B in the docs.

zsunberg commented 8 years ago

fixed with 4dbed51

JuliaPOMDP / POMDPs.jl

Representing POMDP Internal State #72