Jamesflynn1 / CS344-Opponent-Exploitation-Poker

A third year uni project aiming to implement and evaluate the EFR algorithm with different deviation types and explore a potential tradeoff between exploitability and expected value of a strategy in practice.
0 stars 0 forks source link

Compare and contrast EFR and CFR #20

Closed Jamesflynn1 closed 1 year ago

Jamesflynn1 commented 1 year ago

Why I am using EFR as opposed to CFR, why haven't more people done this, what are the advantages/disadvantages.

All are good questions, look more into the Morrhill paper.

Jamesflynn1 commented 1 year ago

A little more work on this is required

Jamesflynn1 commented 1 year ago

CFR Key Ideas: Maximum regret over all actions approaches zero. The that such rate goes to zero is governed by a regret bound which is algorithm dependent.

EFR Key ideas: Time selection regret minimisation Time selection functions (experts deciding who to listen to) Deviations (take different actions depending on the current information set, proceeding information set or previous information set) Hindsight rationality Observable hindsight rationality (keep some observations hidden to limit computational complexity, given this allow the learner to be rational to the best of their observations). Partial deviation sequences (allow three distinct phases, correlated play, deviated play, recorrelated play, this is shown to improve strategic power). Mediated equilibrium (an equilibrium strategy profile where each player is rational with respect to a deviation set). Memory probability function with respect to a deviation (generalises counterfactual reach probability to account for memory states and additionally playouts according to a given deviation)

Jamesflynn1 commented 1 year ago

Initial ideas are good, need to write this up