Fix description of rewards in MRP example in section 6.4 of RL day

It was said in section 6.4 that rewards [10., 2., 3.] are rewards for a transition into state 1, 2, and 3, respectively. But such definition and corresponding 'rewards' variable in exercise is not consistent with code of solution for exercises in lxmls-toolkit, despite code of solution is correct. So, here we change definition of 'rewards' variable to make it consistent with code. New definition is: [10., 2., 3.] are expected values of the next reward for each state. And naturally, for clarity we also introduce actual rewards obtained after transition to each state. For more details see corresponding pull request in 'lxmls-toolkit' project: LxMLS/lxmls-toolkit#130

LxMLS / lxmls-guide

Fix description of rewards in MRP example in section 6.4 of RL day #124