Closed ChenHuaYou closed 4 years ago
but i put the policy evaluate and the policy improvement in the bigger while loop , found it run twice evaluate and once improvement makes the result equivalent to your result .. , i dont know if it is a coincidence
It's not a bug. It's value iteration not policy iteration, so there is no policy evaluation. Computing the optimal policy is solely for plotting.
i think from the line # compute the optimal policy to the end of for loop, it should be in the while loop