LxMLS / lxmls-guide

Lisbon Machine Learning Summer School Lab Guide
81 stars 61 forks source link

Fix description of rewards in MRP example in section 6.4 of RL day #124

Closed q0o0p closed 5 years ago

q0o0p commented 5 years ago

It was said in section 6.4 that rewards [10., 2., 3.] are rewards for a transition into state 1, 2, and 3, respectively. But such definition and corresponding 'rewards' variable in exercise is not consistent with code of solution for exercises in lxmls-toolkit, despite code of solution is correct. So, here we change definition of 'rewards' variable to make it consistent with code. New definition is: [10., 2., 3.] are expected values of the next reward for each state. And naturally, for clarity we also introduce actual rewards obtained after transition to each state. For more details see corresponding pull request in 'lxmls-toolkit' project: LxMLS/lxmls-toolkit#130

MStaniek commented 5 years ago

Looks perfect for me!