LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction
MIT License
1.97k stars 461 forks source link

Clarification Ex 4.5_ Deterministic Policy #66

Closed Avalpreet closed 4 years ago

Avalpreet commented 4 years ago
  1. Policy Evaluation: Since the policy pi is deterministic, shouldn't the _sigma_a' pi(a'|s') Q(s',a') be replaced with Q(s', pi(s')) ?
LyWangPX commented 4 years ago

pi is deterministic, that’s true. The environment is not necessarily. So I keep the original sigma there for completeness.

Avalpreet commented 4 years ago

Thanks for the clarification