Exercise 3.29 might have a mistake

LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction

MIT License

2.02k stars 466 forks source link

Exercise 3.29 might have a mistake #83

Open rvitorper opened 3 years ago

rvitorper commented 3 years ago

Hello!

I was checking your answer for exercise 3.29, and I think it might have a mistake. The final equation averages over all actions, whereas I think it should be the maximum of all actions - hence removing the policy function.

I believe it is a mistake because the backup diagram for q*(page 64) shows the maximum rather than the average.

Looking forward to hearing from you!