Closed dylan-asmar closed 11 months ago
Yes, this needs to be updated since it has the same mistake the other issue you reported has. That required defining some additional observation values.
We can't change pagination, so adding something to the effect of "for this problem with 2 states and 2 actions" was instead injected into Example 22.5:
Thanks!
@tawheeler, your turnaround time is always impressive!
If it is a 2-state, 2-observation problem, I think the first image needs to be updated? I think the transition and observation probabilities need to only consider $s_1$ and $s_2$. Also, the transition probabilities would then sum to 1?
Ah, you are right. Clearly at least a 3-state problem. Actually, the example doesn't use the product order that this issue brings up, so I swapped that over too:
As written, exercise 22.3 isn't consistent with algorithm 22.2 and can cause a bit of confusion. I also think there might be an error with the provided solution and the provided observation probabilities.
I understand that the order we use for the states and observations isn't critical as long as we are consistent and maintain the determinized aspect of the algorithm. However, the
successor
function in Algorithm 22.2 iterates over $s^\prime$ and $o$ by usingproduct
. Therefore, that would imply an order of $(s_1, o_1), (s_2, o_1), (s_3, o_1), \dots$ would be used.Even if we aren't considering that order and skipping over transitioning to $s_1$ (from the provided transition probabilities, we know $T(s_1 \mid s_1, a_3) \leq 0.15$ but we can't determine it based on not knowing the number of states), we aren't provided the appropriate observation probabilities. Our observation function is of the form $O(o \mid s^\prime, a)$. However, we are given $O(\cdot \mid s_1, a_3)$ instead of $O(\cdot \mid s_2, a_3)$ and $O(\cdot \mid s_3, a_3)$.