abhisheknaik96 / differential-value-iteration

Experiments in creating the ultimate average-reward planning algorithm
Apache License 2.0
0 stars 2 forks source link

Updated mdvi sync algorithm to correct bug I introduced earlier. #32

Closed btanner closed 3 years ago

btanner commented 3 years ago

The bug affected the algorithm's ability to solve multichain MRPs.

Thanks for pointing this out @yiwan-rl.

I had to reduce tolerances in the tests, but there is a PR coming later than ensures the 64bit precision is used throughout the algorithms that should allow us to move it back up.