Jamesflynn1 / CS344-Opponent-Exploitation-Poker

A third year uni project aiming to implement and evaluate the EFR algorithm with different deviation types and explore a potential tradeoff between exploitability and expected value of a strategy in practice.
0 stars 0 forks source link

Fix action probabilities to work with all deviation sets #31

Closed Jamesflynn1 closed 1 year ago

Jamesflynn1 commented 1 year ago

Bug to do with either histories or deviation variables. Should be an easy fix, need to test on blind action deviations first.

Jamesflynn1 commented 1 year ago

We can 1. store prior strategies to state (or atleast the index of) for the state player 2. store action history to state for the state player 3. recreate the probs leading to state according to deviation 4. weight the states to either be included or excluded according to the memory weights (i.e deviation type dependent).

1 to 3 has been tested using a working computation of state player reach prob (OpenSpiel CFR computation reach_prob) and with 4 set to np.ones, they are equivalent for 100 iterations (but two iterations would be sufficient to check) for ALL states.

BPS appears to have CFR like performance maybe slightly less exploitable, will check with 10000 iterations on the batch compute.

This appears to have fixed the issue.