RasmusBrostroem / ConnectFourRL

0 stars 0 forks source link

Implement epsilon-greedy `TDAgent` #99

Closed jbirkesteen closed 1 year ago

jbirkesteen commented 1 year ago

Currently, we use a greedy strategy, always taking the action resulting in the best value estimate. Change this approach into an epsilon-greedy method (with the greedy method still being implemented as the special case of epsilon=0) and add epsilon as an attribute to TDAgent.

Originally posted by @jbirkesteen in https://github.com/RasmusBrostroem/ConnectFourRL/discussions/43#discussioncomment-6700332