I think that in the Dyna-Q+ class, the reset method should not include the command: self.time = 0, since this will create the issue of NaN values in the Q-function, due to the fact that we will have self.time - _time < 0, which is then inserted in a square-root!
I think that in the Dyna-Q+ class, the reset method should not include the command:
self.time = 0
, since this will create the issue of NaN values in the Q-function, due to the fact that we will haveself.time - _time < 0
, which is then inserted in a square-root!in Line: https://github.com/MJeremy2017/reinforcement-learning-implementation/blob/0fecb49bc674f7269e5456cd0d978588e3199761/DynaMaze/DynaQ%2B.py#L116