Terminology should use the language of `Deterministic Finite Automata` and `Semiautomata`

Markov Decision Process

A Markov Decision Process is a $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R})$ where

$\mathcal{S}$ consists of states,
$\mathcal{A}$ consists of actions,
- more generally, $\mathcal{A}_s$ is a set of actions permitted from $s\in\mathcal{S}$
$\mathcal{P}_a(s, s')$ is the probability, when permitted, that visiting $(s, a)$ leads to $s'$
$R_a(s, s')$ is the (expected) immediate reward if visiting $(s, a)$ leads to $s'$

We don't quite need the generality of a MDP because our transition probabilities are determinstic. I.e., $\mathcal{P}a(s, \cdot) = \delta{s'}$ for some $s'\in\mathcal{S}$.

Deterministic Finite Automaton

A Deterministic Finite Automaton consists of

$\mathcal{Q}$ (states)
$\Sigma$ (alphabet)
$\delta: \mathcal{Q}\times\Sigma\to\mathcal{Q}$ (transition function),
$q_0\in\mathcal{Q}$ (initial state)
$\mathcal{F}\subseteq\mathcal{Q}$ (accept (final) states)

This is closer to our formulation. Note that:

a DFA has no cost/reward function
- instead of including a reward function, we are optimizing an objective over a DFA
a DFA has a specific initial state
- we may consider a DFA for each MCTS
this formally does explicitly leave room for actions sets to be restricted in any way
- but this can be patched by replacing a $(\mathcal{Q}, \Sigma, \delta: \mathcal{Q}\times\Sigma\to\mathcal{Q}^\ast)$
- (here, $\mathcal{Q}^\ast$ is $\mathcal{Q}$ with dud state added)
- with $(\mathcal{Q}^\ast, \Sigma, \delta^\ast: \mathcal{Q}^\ast\times\Sigma\to\mathcal{Q}^\ast)$ with the dud state serving as a final object in the corresponding monoidal category

Across each MCTS, the transition dynamics are shared by a common semiautomaton.

Semiautomaton

A Semiautomaton has

$\mathcal{Q}$ (states),
$T: \mathcal{Q}\times\Sigma\to\mathcal{Q}$

As before, we may accommodate partial transition functions $T:\mathcal{Q}\times \Sigma\to\mathcal{Q}$ with the canonical monadic extension.

ariasanovsky / azdopt