Fix action choice in Q-Learning

LxMLS / lxmls-toolkit

Machine Learning applied to Natural Language Processing Toolkit used in the Lisbon Machine Learning Summer School

Other

222 stars 216 forks source link

Fix action choice in Q-Learning #173

Closed q0o0p closed 3 years ago

q0o0p commented 4 years ago

There was bug is in action choosing in Q-Learning code. Action was selected randomly instead of using policy derived from current Q values. In this commit I have updated it to use epsilon-greedy strategy.

NOTE: result answers will remain the same because it's rather simple environment thus Q-Learning also worked fine despite the bug.

Counterpart pull request for student branch: #174