Implement epsilon-greedy `TDAgent`

Currently, we use a greedy strategy, always taking the action resulting in the best value estimate. Change this approach into an epsilon-greedy method (with the greedy method still being implemented as the special case of epsilon=0) and add epsilon as an attribute to TDAgent.

Originally posted by @jbirkesteen in https://github.com/RasmusBrostroem/ConnectFourRL/discussions/43#discussioncomment-6700332

RasmusBrostroem / ConnectFourRL

Implement epsilon-greedy `TDAgent` #99