eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms
MIT License
582 stars 152 forks source link

Questions about selection_rule in OPD #46

Closed marooncn closed 4 years ago

marooncn commented 4 years ago

Hi, in _selectionrule function in deterministic.py, it chooses the max _valueupper, but in the OPD paper, the algorithm expands according to the max _valueupper while it chooses action according to the _valuelower. 2020-08-06 11-51-51屏幕截图

eleurent commented 4 years ago

Oh no, you're right! 😮 Thank you so much for spotting this. It is a regression, due to a poor choice of function name get_value(), which used to refer to the value lower bound, and was recently changed to the value upper bound during some refactoring.