Closed Jamesflynn1 closed 1 year ago
A little more work on this is required
CFR Key Ideas: Maximum regret over all actions approaches zero. The that such rate goes to zero is governed by a regret bound which is algorithm dependent.
EFR Key ideas: Time selection regret minimisation Time selection functions (experts deciding who to listen to) Deviations (take different actions depending on the current information set, proceeding information set or previous information set) Hindsight rationality Observable hindsight rationality (keep some observations hidden to limit computational complexity, given this allow the learner to be rational to the best of their observations). Partial deviation sequences (allow three distinct phases, correlated play, deviated play, recorrelated play, this is shown to improve strategic power). Mediated equilibrium (an equilibrium strategy profile where each player is rational with respect to a deviation set). Memory probability function with respect to a deviation (generalises counterfactual reach probability to account for memory states and additionally playouts according to a given deviation)
Initial ideas are good, need to write this up
Why I am using EFR as opposed to CFR, why haven't more people done this, what are the advantages/disadvantages.
All are good questions, look more into the Morrhill paper.