abhisheknaik96 / differential-value-iteration

Experiments in creating the ultimate average-reward planning algorithm
Apache License 2.0
0 stars 2 forks source link

Change MRP (MarkovRewardProcess) and MDP (MarkovDecisionProcess) to use dataclasses #9

Closed btanner closed 3 years ago

btanner commented 3 years ago

This change is pretty big. The individual commits should be self explanatory, but I learned some things in between so it might be most helpful to just look at the final results.

@yiwan-rl @abhisheknaik96 Please ask questions if you have any!

yiwan-rl commented 3 years ago

Thanks, I also learned that we should have one commit for one purpose.