LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction
MIT License
2.04k stars 465 forks source link

[Ex 4.5] Deterministic policy #86

Open Jonathan2021 opened 3 years ago

Jonathan2021 commented 3 years ago

In your pseudocode for calculating q, if π is deterministic (as stated in initialization and in pseudocode given for v), then you don't need to loop on all a∈A in step 2 and you don't need to a to ponderate on all a' for the Q(s,a) calculation.

Again, in step 3 you shouldn't loop on a because you get old-action with the deterministic policy.

Thanks for considering this fix ;) Have a nice day !