chufanchen / read-paper-and-code

0 stars 0 forks source link

NeurIPS 2017 | Hindsight Experience Replay #183

Open chufanchen opened 2 months ago

chufanchen commented 2 months ago

https://arxiv.org/abs/1707.01495

chufanchen commented 2 months ago

DDPG

behavioral policy: $\pi_{b}(s)=\pi(s)+\mathcal{N}(0,1)$.

UVFA

Extension of DQN to the multi-goal setup.

Multi-goal RL

We assume that given a state s we can easily find a goal g which is satisfied in this state.

\begin{equation}
m: \mathcal{S} \rightarrow \mathcal{G} \quad s.t. \quad \forall_{s\in \mathcal{S}} f_{m(s)}(s)=1
\end{equation}

HER