Open fumin opened 5 days ago
Hi @fumin , my guess is this could be related to bad hyper-parameters for the game of choice.
Have you seen this paper? https://arxiv.org/abs/2103.00187 Have you tried hyper-parameters used for Kuhn in that paper?
There will always be a huge discrepancy between tabular methods and neural network methods. It's hard to compare those two and you certainly should not compared them iteration-by-iteration, it's like comparing apples and oranges. One is perfect (no sampling, no approximation); the other has errors from sampling and from function approximation. If you compared, e.g. value iteration and Q-learning with neural nets, you'd see something similar but it might even be worse in imperfect information games since the metric (exploitability) is a harder thing to reduce since it evaluates the policy "adversarially" rather than evaluating the reward of a (deterministic) policy in e.g. a gridworld.
That said, 0.83 is quite high for Kuhn so it's probably still an issue. Can you use the hyper-parameters and compare to the graphs in the paper and then follow-up? Can you also try the Jax and/or TF implementations to see what they give?
Thanks for the Michael Walton et al. paper, which looks like exactly what I need for this issue! It's great that paper provides code to replicate their results https://github.com/aicenter/openspiel_reproductions .
Some observations:
alternating_updates
. In fact, my tabular CFR still has exploitability of 0.03 at iteration 128, whereas the paper got sub-0.01 exploitability at iteration 100.I will be reporting back my findings soon.
I compared the pytorch deep_cfr with tabular cfr, and found that deep_cfr has a much higher exploitability (deep_cfr 0.83, tabular 0.02). Moverover, printing out the policy of deep_cfr seems to suggest that it doesn't differ much from random.
Below is the output of deep_cfr
Compare the above with tabular cfr, which works nicely:
Below is the code used to test deep_cfr: