avivt / 046203-RL-lectures-notes

Lecture notes for course 046203 Planning and Reinforcement Learning, Technion
5 stars 6 forks source link

Update Lecture4.tex #10

Closed Hadayo closed 4 years ago

Hadayo commented 4 years ago

line 275 - "induced by $({s_t},{a_t})$ on the state-action pairs $({s_t},{a_t})$" could there be a mistake there? I didn't really understand this sentence but I might be wrong.

line 289 - consistency with optimal policy notation pi^*.

avivt commented 4 years ago

line 275 - yeah, should be \pi

Hadayo commented 4 years ago

So that should be "Denote the marginal distributions induced by \pi $({s_t},{a_t})$ on the state-action pairs $({s_t},{a_t})$"?

avivt commented 4 years ago

No, that's not the correct representation of \pi (it of course does not depend on a, and is also history dependent in this case). Can be just $\pi$.

avivt commented 4 years ago

Thanks!