Some puzzle abut the section 7.2.2 in Appendix of the paper

SonSang / gippo

Code for Paper "Gradient Informed Proximal Policy Optimization" (NeurIPS 2023)

12 stars 1 forks source link

Some puzzle abut the section 7.2.2 in Appendix of the paper #2

Open yufengsjtu opened 3 weeks ago

yufengsjtu commented 3 weeks ago

Hi, I would like to ask why the derivative of θ in this place can be directly included in the expectation. Does the distribution function q not contain θ?

SonSang commented 3 weeks ago

Hi, thank you for your interest.

The q is a fixed distribution, which does not contain theta as a variable. Therefore, we can put derivative operator inside the expectation.