PPO - Githubissues

theRealSuperMario commented 6 years ago

stuff from last pdg

theRealSuperMario commented 6 years ago

Sounds good. you need to show me how to do that.

Although I am usually for unconditional love, I guess this would be utopic here.

On Wed, Sep 5, 2018, 16:59 Leander Kurscheidt notifications@github.com wrote:

@LeanderK commented on this pull request.

In ReinforcementLearning.tex https://github.com/ML-KA/PDG-Compendium/pull/2#discussion_r215306249:

+ +\begin{algorithm}[H]

\For{iteration=1,2,\dots}{

\For{actor=1,2, \dots}{

Run policy $\pi{\theta{\mathrm{old}}}$ in environment for $T$ timesteps\

Compute advantage estimates $\hat{A_1}, \dots, \hat{A_t}$

}

Optimize surrogate $L$ wrt $\theta$, with $K$ epochs and minibatch size $M \leq NT$ \

$\theta_{old} \leftarrow \theta$

}

\caption{PPO, Actor-Critic Style} +\end{algorithm}

+\paragraph{Questions} +\begin{enumerate}

why not use a conditional? then we can keep the questions and the defintions together and just disable them when needed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ML-KA/PDG-Compendium/pull/2#discussion_r215306249, or mute the thread https://github.com/notifications/unsubscribe-auth/AGNy2SL88p2vhdF_Hoxet1s_UuX_Pn-pks5uX-bcgaJpZM4WUzaF .

theRealSuperMario commented 6 years ago

as discussed: \toggletrue{questions}

theRealSuperMario commented 6 years ago

can someone merge maybe? I added the conditional as requested

LeanderK commented 6 years ago

sry for taking so long, forgot the PR. You can mention me via "@" @theRealSuperMario, this way I get a n notification

ML-KA / PDG-Compendium

PPO #2

@LeanderK commented on this pull request.