A third year uni project aiming to implement and evaluate the EFR algorithm with different deviation types and explore a potential tradeoff between exploitability and expected value of a strategy in practice.
How max regret grows sublinearly (leading to an equilibrium)
How it works with all deviations subsets (unlike CFR)
Potential of equilibrium concepts to work better in some opponent cases.
Time selection regret minimisation
How max regret grows sublinearly (leading to an equilibrium) How it works with all deviations subsets (unlike CFR) Potential of equilibrium concepts to work better in some opponent cases. Time selection regret minimisation more??