abhisheknaik96 / differential-value-iteration

Experiments in creating the ultimate average-reward planning algorithm
Apache License 2.0
0 stars 2 forks source link

Adds a separate implementation of Evaluation Algorithm with RVI. #19

Closed btanner closed 3 years ago

btanner commented 3 years ago

Would like some feedback from @yiwan-rl and @abhisheknaik96 about how this implementation looks?

The test checks for convergence.

If we like this, I can convert all the algs to be sortof self-contained in this way.

yiwan-rl commented 3 years ago

This looks great! We can use this kind of class for other algos.

btanner commented 3 years ago

@yiwan-rl @abhisheknaik96 Based on @yiwan-rl's positive feedback, I'll proceed with converting algorithms into their own modules following (and probably adapting this strategy as a template).

Getting this familiarity will allow me to start looking at their efficiency/timing as we scale the problems and how JAX or other accelerated approaches might help.