AI-related issues to address

How should I design my reward function? Having read an OpenAI post, I'm thinking of...
- a big constant function for winning the game
- an exponentially decaying function for 'good' or 'bad' behavior
- ex) good: shouting "die" when the chance of winning is low
- ex) bad: shouting "die" when the chance of winning is high
What should the value of my epsilon be?
- a constant function: e_k = .1 (.05 afterwards) (source)
- an inverse function: e_k = 1/k (source)
- an exponentially decaying function: e_k = 0.9 * a^k for 0 < a < 1 (source)
How should I implement multiple rules? Rules should change depending on whether an input is timed or what happens when the sums of both players are equal. I'm thinking of subclassing Game and overriding necessary classes/methods, but not sure yet.
Both players have the same knowledge about their own or the opponent's deck, like the game of Go, Chess, and Gomoku. That means the same algorithm can be applied to both sides in a game, doubling the data received per game. However, I also read that "[Q learning] isn't likely to lead to very good results if you assume that the opponent can also learn. "

Bartleby2718 / DieOrDare