algorithmsbooks / decisionmaking

Algorithms for Decision Making textbook
516 stars 53 forks source link

Exercise 9.9 Wording #84

Closed dylan-asmar closed 2 years ago

dylan-asmar commented 2 years ago

image

I think the wording of

...use the upper confidence bound in equation (9.1) to compute the optimal action for each state with an exploration parameter ...

might be a bit misleading. My understanding of this problem was to use the exploration parameters and tables to select the best action to traverse/explore during an MCTS at each state. I think a slight rewording to remove "optimal" would help clarify this exercise. As worded, I think it could be interpreted as we have completed our search and we need to pick the action that maximizes our estimate of Q (and the exploration parameters are not used).

Example 9.3 uses the following phrase when discussing a similar step:

The second simulation begins by selecting the best action from the initial state according to our exploration strategy in equation (9.1).

tawheeler commented 2 years ago

Thanks - changing "optimal" to "best MCTS traversal"