Closed btanner closed 3 years ago
This updated version fixes the bug with the previous test. MDVI Control 1 and 2 now reach the same policies on all our test problems.
@yiwan-rl This is an example of what I mean by vectorizing. I really like MDVI Control 2 :)
@abhisheknaik96 @yiwan-rl This PR has been updated to include vectorized sync/async MDVI Control 2, and it passes all tests (including a new async policy check vs RVI and DVI.
Tests are updated to run with MDVI Control 2.
Control 1 and 2 converge to something on all of the same problems with the same hypers.
@abhisheknaik96 @yiwan-rl
However, in policy_test.py, MDVI Control 1 , RVI, DVI converge to the SAME policies on: MDP1, MDP2, GARET 1/2/3.
MDVI Control 2 DOES NOT converge to the same policy as MDVI Control1 on 2 of the GARET tasks.
I have not dug into this at all yet. Just trying to get some progress updates to you folks at the end of my day!