Open Bahador-Bakhshi opened 3 years ago
Dyna needs more less episodes to converge
It seems that, in large problems, it is really beneficial to use it instead of direct Q-Learning
Dyna needs more less episodes to converge
It seems that, in large problems, it is really beneficial to use it instead of direct Q-Learning