Fix action choice in Q-Learning

LxMLS / lxmls-toolkit

Machine Learning applied to Natural Language Processing Toolkit used in the Lisbon Machine Learning Summer School

Other

223 stars 215 forks source link

Fix action choice in Q-Learning #174

Closed q0o0p closed 3 years ago

q0o0p commented 4 years ago

There was bug is in action choosing in Q-Learning code. Action was selected randomly instead of using policy derived from current Q values. In this commit I have updated it to use epsilon-greedy strategy.

NOTE: result answers will remain the same because it's rather simple environment thus Q-Learning also worked fine despite the bug.

Counterpart pull request for master branch: #173

chrishokamp commented 3 years ago

LGTM

zehsilva commented 3 years ago

I will try to merge!