DavidJaz / Systems

Double Categories of Open Dynamical Systems
3 stars 0 forks source link

Open Markov Systems and MDPs #1

Open jpfairbanks opened 4 years ago

jpfairbanks commented 4 years ago

Hi Jaz, How do your open markov systems relate to things like Markov Decision Processes and Partially Observed Markov Decision Processes.

From wikipedia:

A discrete-time POMDP models the relationship between an agent and its environment. Formally, a POMDP is a 7-tuple (S,A,T,R,\Omega ,O,\gamma ), where

S is a set of states,
A is a set of actions,
T is a set of conditional transition probabilities between states,
R: S \times A \to \mathbb{R} is the reward function.
\Omega  is a set of observations,
O is a set of conditional observation probabilities, and
\gamma \in [0, 1] is the discount factor.

It seems like POMDPs are an established framework for modeling agents that interact with an environment. Can they be explained with this CT machinery?

DavidJaz commented 4 years ago

Great question!

My notion of D(R x -) open dynamical system (where D(R x -) is the monad that takes a set X to the probablility distributions on R x X where R is the real numbers) is a very similar notion to the notion of Markov Decision Process (from wikipedia). Namely, it consists of a set S of states, for each state s a set A_s of actions available in that state, and a we have for each state s and action a valid in s a distribution on pairs of rewards and next states, that is, p((r, s') | s, a). From this we could marginalize to a distribution on new states and a distribution on rewards, which is closer to the definition given in wikipedia.

A POMDP is slightly more complicated since there is the addition of an observation set and observation probabilities. The observation probabilities can be understood as a D(R x - )-lens, but I'm not sure how well the formalism fits this case.

jpfairbanks commented 4 years ago

It would be very interesting to build an MDP solver based on exploiting the hierarchical structure of large scale MDPs. For example you could have multiple agents making decisions and wire them together via an operad. Then in order to create an optimal policy, you would use the operad structure to assemble optimal policies for the big system from optimal policies for the little systems. If this led to a faster algorithm for MDP solving, that would be a significant achievement for ACT.