A Markov Decision Process is a $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R})$ where
$\mathcal{S}$ consists of states,
$\mathcal{A}$ consists of actions,
more generally, $\mathcal{A}_s$ is a set of actions permitted from $s\in\mathcal{S}$
$\mathcal{P}_a(s, s')$ is the probability, when permitted, that visiting $(s, a)$ leads to $s'$
$R_a(s, s')$ is the (expected) immediate reward if visiting $(s, a)$ leads to $s'$
We don't quite need the generality of a MDP because our transition probabilities are determinstic.
I.e., $\mathcal{P}a(s, \cdot) = \delta{s'}$ for some $s'\in\mathcal{S}$.
instead of including a reward function, we are optimizing an objective over a DFA
a DFA has a specific initial state
we may consider a DFA for each MCTS
this formally does explicitly leave room for actions sets to be restricted in any way
but this can be patched by replacing a $(\mathcal{Q}, \Sigma, \delta: \mathcal{Q}\times\Sigma\to\mathcal{Q}^\ast)$
(here, $\mathcal{Q}^\ast$ is $\mathcal{Q}$ with dud state added)
with $(\mathcal{Q}^\ast, \Sigma, \delta^\ast: \mathcal{Q}^\ast\times\Sigma\to\mathcal{Q}^\ast)$ with the dud state serving as a final object in the corresponding monoidal category
Across each MCTS, the transition dynamics are shared by a common semiautomaton.
Markov Decision Process
A Markov Decision Process is a $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R})$ where
We don't quite need the generality of a MDP because our transition probabilities are determinstic. I.e., $\mathcal{P}a(s, \cdot) = \delta{s'}$ for some $s'\in\mathcal{S}$.
Deterministic Finite Automaton
A Deterministic Finite Automaton consists of
This is closer to our formulation. Note that:
Across each MCTS, the transition dynamics are shared by a common semiautomaton.
Semiautomaton
A Semiautomaton has
As before, we may accommodate partial transition functions $T:\mathcal{Q}\times \Sigma\to\mathcal{Q}$ with the canonical monadic extension.