Closed fumin closed 6 days ago
Hi @fumin , this is not necessarily incorrect. The minimax-optimal values for every action at the root of Tic-Tac-Toe are zero. Since CFR is computing an approximate Nash equilibrium, there is no guarantee that it'll prefer any specific action over any other, since game-theoretically they're all equivalent.
Thanks for the explanation, that makes a lot of sense. In other words, two player zero sum only guarantees the existence of equilibria, but not its uniqueness. Any mixture among equilibria is another equilibrium. Thanks for the insight!
It's not really about uniqueness per se, it's more about definition of the Nash equilibrium. Any move made by the first player has the same game-theoretic optimal value (0) because player 1 can still force a tie under all circumstances. So it's indifferent. Any Nash equilibrium will assign values of 0 to all nine moves, and so any policy is "optimal" assuming the play afterward remains optimal.
Thanks for the correction. I do need to change my conclusion to:
Any mixture among equilibria, *of the same value*, is another equilibrium
In general, different equilibria may have different values, especially in coordination games. However, in the case of zero-sum games, is it true that all equilibria have the same value "0", and thus it's always OK to mix equilibrium strategies?
Yes, that is correct! 👍
Why does Tabular CFR play the sides on an empty tic_tac_toe board? Shouldn't the correct play be at the center? By playing the sides I mean the CFR final policy for the empty board is
[0.072 0.177 0.072 0.177 0.002 0.177 0.072 0.177 0.072]
. The action of playing the center gets the lowest probability of 0.002, which is also very weird.The nashconv value of 0.005 after 1024 steps, does seem to indicate successful training, though.
Below is the code to replicate the above results (taken from https://github.com/google-deepmind/open_spiel/blob/master/open_spiel/colabs/CFR_and_REINFORCE.ipynb)