Open chufanchen opened 2 months ago
behavioral policy: $\pi_{b}(s)=\pi(s)+\mathcal{N}(0,1)$.
Extension of DQN to the multi-goal setup.
We assume that given a state s we can easily find a goal g which is satisfied in this state.
\begin{equation}
m: \mathcal{S} \rightarrow \mathcal{G} \quad s.t. \quad \forall_{s\in \mathcal{S}} f_{m(s)}(s)=1
\end{equation}
https://arxiv.org/abs/1707.01495