Several fixes and cleanups

abhisheknaik96 / differential-value-iteration

Experiments in creating the ultimate average-reward planning algorithm

Apache License 2.0

0 stars 2 forks source link

Closed btanner closed 2 years ago

btanner commented 2 years ago

Restructure towards 64-bit as a first choice.
DVI/MDVI control algorithms updated to divide alpha and beta by num_states
side by side implementations of sync updates in MDVI to ensure vectorized code is right
fixed greedy_policy calculation in MDVI.

btanner commented 2 years ago

I'm going to commit this and the following PR and then continue working.