Open jpfairbanks opened 4 years ago
Great question!
My notion of D(R x -) open dynamical system (where D(R x -) is the monad that takes a set X to the probablility distributions on R x X where R is the real numbers) is a very similar notion to the notion of Markov Decision Process (from wikipedia). Namely, it consists of a set S of states, for each state s a set A_s of actions available in that state, and a we have for each state s and action a valid in s a distribution on pairs of rewards and next states, that is, p((r, s') | s, a). From this we could marginalize to a distribution on new states and a distribution on rewards, which is closer to the definition given in wikipedia.
A POMDP is slightly more complicated since there is the addition of an observation set and observation probabilities. The observation probabilities can be understood as a D(R x - )-lens, but I'm not sure how well the formalism fits this case.
It would be very interesting to build an MDP solver based on exploiting the hierarchical structure of large scale MDPs. For example you could have multiple agents making decisions and wire them together via an operad. Then in order to create an optimal policy, you would use the operad structure to assemble optimal policies for the big system from optimal policies for the little systems. If this led to a faster algorithm for MDP solving, that would be a significant achievement for ACT.
Hi Jaz, How do your open markov systems relate to things like Markov Decision Processes and Partially Observed Markov Decision Processes.
From wikipedia:
It seems like POMDPs are an established framework for modeling agents that interact with an environment. Can they be explained with this CT machinery?