It was said in section 6.4 that rewards [10., 2., 3.] are rewards for a
transition into state 1, 2, and 3, respectively.
But such definition and corresponding 'rewards' variable in exercise is
not consistent with code of solution for exercises in lxmls-toolkit,
despite code of solution is correct.
So, here we change definition of 'rewards' variable to make it
consistent with code.
New definition is:
[10., 2., 3.] are expected values of the next reward for each state.
And naturally, for clarity we also introduce actual rewards obtained
after transition to each state.
For more details see corresponding pull request in 'lxmls-toolkit' project: LxMLS/lxmls-toolkit#130
It was said in section 6.4 that rewards [10., 2., 3.] are rewards for a transition into state 1, 2, and 3, respectively. But such definition and corresponding 'rewards' variable in exercise is not consistent with code of solution for exercises in lxmls-toolkit, despite code of solution is correct. So, here we change definition of 'rewards' variable to make it consistent with code. New definition is: [10., 2., 3.] are expected values of the next reward for each state. And naturally, for clarity we also introduce actual rewards obtained after transition to each state. For more details see corresponding pull request in 'lxmls-toolkit' project: LxMLS/lxmls-toolkit#130