Open jeremysalwen opened 2 months ago
In general, this is a limitation on the theory behind CFR, it only works for perfect recall games. It needs prefect recall because it needs the information sets to be a dag.
If I understand the game you're describing correctly, the first player only moves once, and after that only the second player guesses until the game is over? If that's the case, then you're probably better off looking at this as a stackleberg game, and "collapsing" the second players guesses into a single choice of "strategy." I'm unfortunately not super well versed in stackleberg solving, so I can't really give guidance beyond this, but my guess is is you use numeric techniques with some notion of what a best response looks like. If this is zero sum, you could do something like: pick a strategy for player 1, then compute the optimal strategy for player 2, and do gradient descent to maximize player 1's strategy. I think this should work, at least in terms of finding a nash equilibirium, but I wouldn't trust me, and instead do some reading.
Essentially I'm saying this library isn't really designed for your task. However, your game still has perfect recall, it's just not a concise representation. You could always try it, but my guess is that due to the permutations of actions, this probably won't work.
Hopefully some of this helps?
Hmm, my understanding of the theory behind CFR is that it won't work in general for imperfect recall games, but for my specific game, it would work. The reason I believe so is that a player who forgot their previous guesses would still play just as optimally as one who had perfect recall, and thus their zero-regret guarantee from the regret matching algorithm at each information set would guarantee that they would also have zero regrets even with full information.
https://arxiv.org/pdf/1205.0622 is a paper I found that I believe makes a similar point, that if you have a perfect recall refinement of your game (what they call "Well Formed"), then CFR still works in the imperfect recall case (and that empirically it works okay even without that guarantee).
I think you're right that there are other approaches I could take for solving it like you suggested (e.g. we just directly optimize the maximum of the expected number of guesses calculated as a function of player 2's strategy), but I think your library would also solve it efficiently if there was just a way to bypass this check.
Yeah, that makes sense. Unfortunately the notion of perfect recall is an implementation detail, as the current code constructs the whole game tree once, and when doing the CFR calculations needs to propagate regrets from terminal nodes back up the game tree. This could be a mutable variable, modified on the forward pass, but doing so would be a non trivial change. PRs are always welcome, but I don't foresee having time to do it myself.
Is it a technical limitation or just an attempt to help users catch errors?
I have a game where the first player picks a number, and the second player repeatedly guesses and gets feedback whether it's high or low. Because the first player only can take an action at the beginning, the information about the order in which the second player guessed is irrelevant, and so can be forgotten.