Closed MaximeBouton closed 5 years ago
The sparse solver does not enforce the value at terminal state to be 0.
Here is an example where sparse VI and VI give different result: https://gist.github.com/MaximeBouton/b040a4f09ec779dc73f448e3d2e09da5
it might be enough to just check for terminal states when building reward_S_A and add something like this:
if isterminal(mdp, s) reward_S_A[stateindex(mdp, s), :] = 0.0 end
here
The sparse solver does not enforce the value at terminal state to be 0.
Here is an example where sparse VI and VI give different result: https://gist.github.com/MaximeBouton/b040a4f09ec779dc73f448e3d2e09da5
it might be enough to just check for terminal states when building reward_S_A and add something like this:
here