Closed Hadayo closed 4 years ago
line 275 - yeah, should be \pi
So that should be "Denote the marginal distributions induced by \pi $({s_t},{a_t})$ on the state-action pairs $({s_t},{a_t})$"?
No, that's not the correct representation of \pi (it of course does not depend on a, and is also history dependent in this case). Can be just $\pi$.
Thanks!
line 275 - "induced by $({s_t},{a_t})$ on the state-action pairs $({s_t},{a_t})$" could there be a mistake there? I didn't really understand this sentence but I might be wrong.
line 289 - consistency with optimal policy notation pi^*.