Exercise 9.9 Wording - Githubissues

I think the wording of

...use the upper confidence bound in equation (9.1) to compute the optimal action for each state with an exploration parameter ...

might be a bit misleading. My understanding of this problem was to use the exploration parameters and tables to select the best action to traverse/explore during an MCTS at each state. I think a slight rewording to remove "optimal" would help clarify this exercise. As worded, I think it could be interpreted as we have completed our search and we need to pick the action that maximizes our estimate of Q (and the exploration parameters are not used).

Example 9.3 uses the following phrase when discussing a similar step:

The second simulation begins by selecting the best action from the initial state according to our exploration strategy in equation (9.1).

algorithmsbooks / decisionmaking

Exercise 9.9 Wording #84