ML-KA / PDG-Compendium

Repository for sharing Code and Math related to the PDG
MIT License
0 stars 1 forks source link

PPO #2

Closed theRealSuperMario closed 6 years ago

theRealSuperMario commented 6 years ago

stuff from last pdg

theRealSuperMario commented 6 years ago

Sounds good. you need to show me how to do that.

Although I am usually for unconditional love, I guess this would be utopic here.

On Wed, Sep 5, 2018, 16:59 Leander Kurscheidt notifications@github.com wrote:

@LeanderK commented on this pull request.

In ReinforcementLearning.tex https://github.com/ML-KA/PDG-Compendium/pull/2#discussion_r215306249:

+ +\begin{algorithm}[H]

  • \For{iteration=1,2,\dots}{
  • \For{actor=1,2, \dots}{
  • Run policy $\pi{\theta{\mathrm{old}}}$ in environment for $T$ timesteps\
  • Compute advantage estimates $\hat{A_1}, \dots, \hat{A_t}$
  • }
  • Optimize surrogate $L$ wrt $\theta$, with $K$ epochs and minibatch size $M \leq NT$ \
  • $\theta_{old} \leftarrow \theta$
  • }
  • \caption{PPO, Actor-Critic Style} +\end{algorithm}
  • +\paragraph{Questions} +\begin{enumerate}

why not use a conditional? then we can keep the questions and the defintions together and just disable them when needed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ML-KA/PDG-Compendium/pull/2#discussion_r215306249, or mute the thread https://github.com/notifications/unsubscribe-auth/AGNy2SL88p2vhdF_Hoxet1s_UuX_Pn-pks5uX-bcgaJpZM4WUzaF .

theRealSuperMario commented 6 years ago

as discussed: \toggletrue{questions}

theRealSuperMario commented 6 years ago

can someone merge maybe? I added the conditional as requested

LeanderK commented 6 years ago

sry for taking so long, forgot the PR. You can mention me via "@" @theRealSuperMario, this way I get a n notification