LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction
MIT License
2.02k stars 466 forks source link

12.1 #32

Closed luigift closed 4 years ago

luigift commented 4 years ago

image

I've used slightly different approach. If you would like to incorporate the answer I can send you the LaTeX code.

LyWangPX commented 4 years ago

Yes, if you could provide the latex code I will incoroporate your answers and note your name in pdf.

luigift commented 4 years ago

I'll post it here. I recommend adding the latex files and latex generation instructions to the repository so that other people can open pull requests directly. In the future, adding github actions to build the pdf automatically would also be nice.

luigift commented 4 years ago

\begin{equation} \label{eq1} G{t:t+h} \doteq R{t+1} + \gamma G_{t+1:t+h} \end{equation}

\begin{equation} \label{eq2} G^\lambda{t+1} \doteq (1-\lambda) \sum^{\infty}{n=1} \lambda^{n-1}G_{t+1:t+n+1} \end{equation}

\begin{equation} \label{eq3} G{t+1:t+1} \doteq \hat{v}(S{t+1}, w_{t}) \end{equation}

\begin{align} \label{eq4} G^\lambdat & \doteq (1-\lambda) \sum^{\infty}{n=1} \lambda^{n-1}G{t:t+n} \ & = (1-\lambda) \left[G{t:t+1} + \lambda G{t:t+2} + \lambda^2 G{t:t+3} + \dots \right] \ & = (1-\lambda) \left[(R{t+1} + \gamma G{t+1:t+1}) + \lambda (R{t+1} + \gamma G{t+1:t+2}) + \lambda^2 (R{t+1} + \gamma G{t+1:t+3})+ \dots \right] & (\text{by \ref{eq1}})\ & = (1-\lambda) \left[(R{t+1} + \lambda R{t+1} + \lambda^2 R{t+1} + \dots) + (\gamma G{t+1:t+1} + \lambda \gamma G{t+1:t+2} + \lambda^2 \gamma G{t+1:t+3} + \dots) \right] \ & = (1-\lambda) \left[\frac{R{t+1}}{(1-\lambda)} + \gamma G{t+1:t+1} + \gamma(\lambda G{t+1:t+2} + \lambda^2 G{t+1:t+3} + \dots)\right] \ & = R{t+1} + (1-\lambda) \left[ \gamma G{t+1:t+1} + \gamma \lambda (G{t+1:t+2} + \lambda G{t+1:t+3} + \dots) \right] \ & = R{t+1} + (1-\lambda)\gamma G{t+1:t+1} + \gamma \lambda (1-\lambda) \sum^{\infty}{n=1} \lambda^{n-1}G{t+1:t+n+1} \ & = R{t+1} + (1-\lambda)\gamma G{t+1:t+1} + \gamma \lambda G^\lambda{t+1} & (\text{by \ref{eq2}})\ & = R{t+1} + (1-\lambda)\gamma \hat{v}(S{t+1}, w{t}) + \gamma \lambda G^\lambda_{t+1} & (\text{by \ref{eq3}})\ \end{align}

luigift commented 4 years ago

Thank you for mentioning my name. How are your interviews going?

LyWangPX commented 4 years ago

Thanks for your suggestions and update. Well, not so well if say so, but life goes on :)

LyWangPX commented 4 years ago

Uploaded and Fixed. Also update your info in readme. Thanks for your contributions. :)