Closed rejuvyesh closed 8 years ago
For continuity, our previous discussion about this was in #34
Another possible name for this that @rejuvyesh suggested could be AgentState
. I think I like PolicyState
slightly better though.
If we do this right, I think it could help significantly in understanding how solvers work. I think I'm currently leaning towards abstract Belief <: PolicyState
, because I think POMDPs.jl's forte is going to be its ability to support complex solvers that work with generative models and don't use belief as their state. It's easy to explain to someone that a belief is a distribution; it's harder to explain that these advanced solvers and policies work using something that is not a belief, but is similar.
I think I remember discussing this a while ago in my office. I thought that Belief could be any of those things that @rejuvyesh mentioned. I think it is an error that Belief is a type of AbstractDistribution. After all, a Belief (as an AbstractDistribution) doesn't make sense in the context of MCVI.
Yes, @mykelk , the conclusion that we came to last time was that we should just call this thing Belief
. But, I think this made it rather difficult for @rejuvyesh to understand our interface as someone new to it, and that will probably happen again, so we should consider ways to make it more clear. Having been around the interface for so long, it is difficult for me to tell what is confusing and what is not.
If belief is not a subtype of AbstractDistribution
, I suppose that it would prompt people to think "if Belief
is not always a distribution, what is it?". Perhaps proper documentation of Belief
as it is now will alleviate the problem.
Yes. I think calling it Belief
is slightly confusing because the term Belief
has a specific meaning in POMDP and using it to denote the general term representing the internal state of the agent is counter-intuitive.
It would just be so nice if we could do
abstract PolicyState
abstract AbstractDistribution{T}
abstract Belief{S} <: PolicyState, AbstractDistribution{S}
Belief is just a statistic for representing a distribution over the current state---and histories, states in a finite state controller, particles, histograms, etc. can be these statistics. Although, in theory, you can sample from or evaluate the density of the distribution specified by these statistics, you don't necessarily need that capability to solve POMDPs or represent their policies. Does that make sense?
@mykelk I think I understand what you are trying to convey. But I still feel Belief is the wrong term for the idea.
If there is consensus for PolicyState, let's go with that.
wait
I am doing some reading to see how "belief" is used in the literature, and my initial findings are that it is actually used to denote something which may or may not be a probability distribution
I apologize for flip-flopping on this. I am doing that because I really actually don't know what the best solution is.
By the way, while it is important to make a decision on this as soon as possible, this does not prevent us from pushing the parametric interface to master (which will also come with a better documentation framework)
An interesting discussion of the term "belief state" can be found in section 3.2 of Kaelbling's 1998 "Planning and acting in partially observable stochastic domains". One sentence from that section reads "Our choice for belief states will be probability distributions over states of the world." indicating that they view a probability distribution as only one possible choice of "belief state".
Perhaps abstract BeliefState
would better communicate that the type should include things that may not be probability distributions.
Moreover, in Mykel's dmu book (which this software will often be used with), "belief state" is introduced on page 115, and the first example given is win-loss counts for a multi-armed bandit, which is not a probability distribution.
The MCVI paper (Bai, H., Hsu, D., Lee, W., & Ngo, V. (2011). Monte Carlo value iteration for continuous-state POMDPs. Algorithmic Foundations of Robotics IX, 175–191. Retrieved from http://link.springer.com/chapter/10.1007/978-3-642-17452-0_11) does seem to use the term "belief" to denote a probability distribution over states, but it does say "It encodes the belief implicitly in the controller state based on the robot’s initial belief b and the sequence of observations received."
I think BeliefState
is a good type name for something that "implicitly encodes the belief"
David Silver defines a "belief state" as a probability distribution. He says "The belief state is the probability distribution over states given history h" in the POMCP paper. (Silver, D., & Veness, J. (2010). Monte-Carlo Planning in Large POMDPs. In Advances in neural information processing systems (pp. 2164–2172). Retrieved from http://discovery.ucl.ac.uk/1347369/)
So that disagrees with naming our policy state type BeliefState
.
Another possible name could be abstract InformationState
To add more data points, I've seen: InternalMemory
, and InternalState
(via reinforcement learning state of the art)
Naming things is one of the 3 hardest problems in CS :cry:
From the DESPOT paper (Somani, A., Ye, N., Hsu, D., & Lee, W. (2013). DESPOT : Online POMDP Planning with Regularization. Advances in Neural Information Processing Systems, 1–9. Retrieved from http://papers.nips.cc/paper/5189-despot-online-pomdp-planning-with-regularization): "beliefs, which are probability distributions over the states" "The agent maintains a belief, often represented as a probability distribution over S."
I definitely don't think we should just use Belief
, that term is too closely associated with a probability distribution.
Sorry to jump in the middle. Hopefully this isn't just noise.
Suppose there's a POMDP with beautifully intricate observation model. The state transitions are not controllable. Whatever the action you take (let's say it's an enumerable action space with one action), the transition dynamics are the same.
Whatever the reward structure is, you can implement an optimal controller for this POMDP that basically keeps no state. If "belief" means "what you need to keep track of in order to act optimally", then for this POMDP the belief is "nothing". I can still run a state estimator that keeps track of a distribution over states, and so I feel that there is still a belief that isn't just "nothing". It's just that my controller/policy for the POMDP doesn't need to. So the state of this finite state controller (which again, is the trivial one-state-always-do-the-same-thing controller) is "sufficient" for me to know know the distribution over optimal actions. But it is not sufficient for me to know the probability that I receive some observation on the next time step. For that I need the distribution over states.
In the case that the controller is decoupled into "estimator" and "action selector given state distribution", then there's no question as to what the belief is, right? But for other control architectures (like a FSM), I think there are two different distributions/models/?? that a quantity (Belief
, InternalState
, etc) can be a sufficient statistic with respect to: the action distribution and the observation distribution.
From the SARSOP paper (Kurniawati, H., Hsu, D., & Lee, W. (2008). SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces. Robotics: Science and Systems. Retrieved from https://www1.comp.nus.edu.sg/~leews/publications/rss08.pdf): "A belief is a probability distribution over all possible robot states, and the set of all beliefs form the belief space."
We definitely shouldn't use just Belief
to denote all things that the policy can make its decision based on.
@goretkin, thanks for the comment. I think that in some cases it is indeed useful to think of this object as "everything needed to make a good decision" rather than "everything needed to encode belief"
Thanks guys for looking into this. @zsunberg, do you like BeliefState?
I am ok with BeiliefState
, but it seems that more consensus is actually building around PolicyState
.
I still see two issues with this:
Belief
? If people look at a POMDP package and don't find belief, they will be confused.
We could keep abstract Belief <: PolicyState
, but I have found that having extra types floating around is usually a source of pain. Furthermore, Belief
should really be an AbstractDistribution
, and it is impossible to have it be a subtype of both.
A possibly better approach would be to break it out into another package that could contain a suite of beliefs and updaters (to start we could just put it in POMDPToolbox).
In the other package, we could have something liketype BeliefState{S} <: PolicyState
dist::AbstractDistribution{S}
end
rand(..., b::BeliefState , ...) = rand(..., b.dist, ...)
pdf(..., b::BeliefState , ...) = pdf(..., b.dist, ...)
# etc
At the meeting today we took a quick poll between BeliefState
, PolicyState
, and AgentState
. AgentState
got 3 votes; PolicyState' got two.
BeliefState` got none.
I talked to Ed afterwards, and he said that he would be ok with BeliefState
, but was not ready to change it to one of the other two.
There are really valid arguments on both sides - I guess we'll probably just have to take a final vote on this early next week when Mykel's back. I don't think it actually slows anyone down to delay until then.
I vote against AgentState
. The concept of agent doesn't show up anywhere in the code, but Policy does. So, I would be in much more favor of PolicyState
.
btw, right now I'm working on a branch that has Belief
not a subtype of AbstractDistribution
, and it is kind of a pain in practice
@zsunberg can you say a few words about this pain so that we know what sorts of things to look out for in the design of this API?
For example in the crying baby problem, I had to define both BoolDistribution <: AbstractDistribution{Bool}
and ExactBabyBelief <: Belief
which are exactly the same thing. In POMCP, I am having to use Union{Belief, AbstractDistribution}
in some places
What if we pull another parametric trick, i.e.
abstract Policy{T}
abstract Updater{T}
where T
is the policy state or belief? Just throwing ideas out there.
I'm okay with this if the others are.
Notes about this from meeting with Matthijs:
InformationState
, which I actually like quite a bit (actually, I realized that I have used it as a synonym for belief state in a paper before coming to Stanford)btw, InformationState
does appear to have a meaning in game theory (e.g. http://onlinelibrary.wiley.com/doi/10.1111/1467-8586.00134/pdf). It seems to be the set of all states that the decision making agent considers to be possible.
InformationState
is something that occurred to me as well as I was driving back yesterday, but I thought it was still too vague and ambiguous. How about something like GeneralizedBelief
to indicate that we are not talking about the most common understanding of belief
? Then Belief
(a distribution) or TreeNode
or History
could be subtypes of it. I am also ok with just sticking with Belief
(probably still my first choice) or going with BeliefState
, as I mentioned to @zsunberg yesterday.
@zsunberg Out of curiosity, what was the objection with PolicyState?
Also, "He didn't think belief was a good name because he would be confused when he found a belief that was not a probability transition"---do you mean probability distribution?
I don't know if we should be hung up about explicit representation of probability distributions. A belief is just a piece of information that is sufficient for the policy to select its action (e.g., counts of wins and losses in a bandit problem). Just like how a state in an MDP is just a piece of information that is sufficient for predicting the next state and reward. Maybe we can have Belief and BeliefDistribution. Although I'm not completely against BeliefState, it is nice having Belief since it is a single word (like most of our other core concepts) and we don't have to kinds of states---belief states and environment states. I kind of like just dealing with actions, states, observations, beliefs, and policies. We can provide nice crisp definitions of these five core concepts and just stay something like:
I agree with @mykelk.
Not all beliefs will be distributions, but the ones that are could look like type MyBelief <: Belief, AbstractDistribution{MyState}
. Solvers that need the belief to be a distribution, would just use it as a distribution---and it would fail out if it isn't provided (just like other parts of the API).
If anything, I think this is our chance to expand the "conventional" understanding of what a belief is and get the idea that other representations are just as valid a little more out there.
We discussed that there are essentially two routes we can take:
(1) would probably require defining a new concept (2) would not, and we wouldn't even have that hard of a time explaining it in the docs. It matches DMU as well. If I understand correctly, the con is that most of the literature refers to beliefs in the form of distributions.
@mykelk , yes, I meant "distribution".
If we could do what you suggested above with multiple inheritance, this would be so easy to resolve! Unfortunately Julia doesn't permit multiple inheritance :(
Matthijs' response to PolicyState
was "well, it's not really the policy state...". I interpreted this to mean something like the following: in the expression a = pi_s(b), the policy state would be s, i.e. some set of parameters that define the function pi, not the argument to the function. Does that make sense?
As Tim said, the problem with belief is that, in the literature, "belief" nearly universally refers to a probability distribution.(*) Furthermore, the two people from outside the lab that I have discussed this with (Jayesh and Matthijs), have both said that they didn't think belief was the right term. If we use Belief
, as far as I can tell, we will not be adhering to the widely accepted definition. We have to decide if that is a good thing to do.
(*)@mykelk, when you said "A belief is just a piece of information that is sufficient for the policy to select its action" did you mean that the POMDP research community defines belief in this way? I am totally open to evidence that they do.
We either need to gather more information about what the community expects (e.g. Matthijs mentioned that one of his colleagues might have a strong opinion), or just take a vote on it somehow.
^^^ IF ONLY WE KNEW THE PROBABILITY DISTRIBUTIONS OF THIS, IT WOULD BE SUCH A GOOD POMDP
I don't think InformationState
is necessarily too vague - it is any representation of all of the information the agent has collected by receiving observations. Unlike "belief", "information" does not imply that the agent has synthesized the data from the observations into levels of confidence for each state.
Oh, yeah. Multiple inheritance was Julia issue 5. Apparently traits helps with this, but I haven't dug into how that works in Julia yet.
Using belief to mean the information sufficient to select an action is not inconsistent with the literature. We know that a distribution over states is sufficient to select the optimal action in a POMDP. For small, discrete problems, we can represent this distribution exactly using an array of probabilities. Hence, the idea of belief vectors often shows up in the POMDP literature. However, for continuous state problems, we often have to be content with inexact representations using memories (e.g., see McCallum's work), or particles, or mixtures of Gaussians, or some finite state representation. These can all be used to represent the agent's belief about the world.
BTW, belief shows up in other architectures with similar meanings. For example, in the Belief-Desire-Intention architecture, "Beliefs represent the informational state of the agent, in other words its beliefs about the world (including itself and other agents)."
I think it is fine to use belief in this general context, even though the general meaning isn't often provided in POMDP papers. People first understand multiplication in the way they learned it in elementary school, but they later understand multiplication as a more general operator in a group theoretic context that satisfies some basic properties.
Ok, I think, unless we are going to reach out to anyone else in the community, the arguments for both approaches have been pretty fully articulated. The question of whether it is better to coin a new term or expand the definition of "belief" is a judgment call. We'll just have to pick one and see how it goes. What's the best way to decide?
We'll also eventually have to deal with the separate issue of belief distributions not being able to inherit from both this and AbstractDistribution
. Using Policy{B}
and Updater{B}
is possibly a good way to deal with this, but we would still have to decide what to call B
in the docs.
fixed with 4dbed51
@zsunberg suggested me to open a new issue to discuss
PolicyState
.If you survey the literature about different methods for solving POMDPs, one unifying view is to see a POMDP policy as mapping observation history $h_{1:t}$ to an action. But since we can only have finite memory different authors have come up with different approaches to compress this history. This can be intuitively represented as an agent's internal state which can be:
State
Belief
So to keep the interface general enough, defining a
PolicyState
type seems imperative to me.Then we can possibly define:
But @zsunberg reminded that currently
Since
julia
does not support multiple inheritance forabstract
type, we need to decide on which one to accept.