Exercise 22.3 - Inconsistent based on Algorithm 22.2 and incorrect observation probabilities

dylan-asmar commented 11 months ago

As written, exercise 22.3 isn't consistent with algorithm 22.2 and can cause a bit of confusion. I also think there might be an error with the provided solution and the provided observation probabilities.

I understand that the order we use for the states and observations isn't critical as long as we are consistent and maintain the determinized aspect of the algorithm. However, the successor function in Algorithm 22.2 iterates over $s^\prime$ and $o$ by using product. Therefore, that would imply an order of $(s_1, o_1), (s_2, o_1), (s_3, o_1), \dots$ would be used.

Even if we aren't considering that order and skipping over transitioning to $s_1$ (from the provided transition probabilities, we know $T(s_1 \mid s_1, a_3) \leq 0.15$ but we can't determine it based on not knowing the number of states), we aren't provided the appropriate observation probabilities. Our observation function is of the form $O(o \mid s^\prime, a)$. However, we are given $O(\cdot \mid s_1, a_3)$ instead of $O(\cdot \mid s_2, a_3)$ and $O(\cdot \mid s_3, a_3)$.

tawheeler commented 11 months ago

Yes, this needs to be updated since it has the same mistake the other issue you reported has. That required defining some additional observation values.

We can't change pagination, so adding something to the effect of "for this problem with 2 states and 2 actions" was instead injected into Example 22.5:

Thanks!

dylan-asmar commented 11 months ago

@tawheeler, your turnaround time is always impressive!

If it is a 2-state, 2-observation problem, I think the first image needs to be updated? I think the transition and observation probabilities need to only consider $s_1$ and $s_2$. Also, the transition probabilities would then sum to 1?

tawheeler commented 11 months ago

Ah, you are right. Clearly at least a 3-state problem. Actually, the example doesn't use the product order that this issue brings up, so I swapped that over too:

2023-12-03_15-36

algorithmsbooks / decisionmaking

Exercise 22.3 - Inconsistent based on Algorithm 22.2 and incorrect observation probabilities #108