correction for Ex4.2.py

Fix the dynamics of the program to be the same as mentioned in the exercise.

Mentioned in: https://github.com/LyWangPX/Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions/issues/85

new output which reflects the mentioned solution. Previously running the script returned the wrong state values.

----------------------
| 0 | -14 | -20 | -22 |
----------------------
| -14 | -18 | -20 | -20 |
----------------------
| -20 | -20 | -18 | -14 |
----------------------
| -22 | -20 | -14 | 0 |
----------------------
|    | -20 |    |   |
----------------------
Accurate State Values List:
State 1: -14.089474627936156          State 2: -20.11529496144879
State 3: -22.116550116550094          State 4: -14.131432669894195
State 5: -18.153128922359677          State 6: -20.139860139860122
State 7: -20.117805271651406          State 8: -20.241169087322913
State 9: -20.251748251748232          State 10: -18.173211403980616
State 11: -14.097005558544007          State 12: -22.340326340326317
State 13: -20.439483593329726          State 14: -14.204231665770113
State 15: 0          State 16: -20.961628115474248

LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

correction for Ex4.2.py #99