NeurIPS 2017 | Hindsight Experience Replay

chufanchen / read-paper-and-code

0 stars 0 forks source link

Open chufanchen opened 2 months ago

chufanchen commented 2 months ago

chufanchen commented 2 months ago

behavioral policy: $\pi_{b}(s)=\pi(s)+\mathcal{N}(0,1)$.

Extension of DQN to the multi-goal setup.

We assume that given a state s we can easily find a goal g which is satisfied in this state.

\begin{equation}
m: \mathcal{S} \rightarrow \mathcal{G} \quad s.t. \quad \forall_{s\in \mathcal{S}} f_{m(s)}(s)=1
\end{equation}