bupticybee / TexasSolver

🚀 A very efficient Texas Holdem GTO solver :spades::hearts::clubs::diamonds:
https://bupticybee.github.io/texassolver_page
GNU Affero General Public License v3.0
1.65k stars 295 forks source link

Inquiry for possible performance improvement #162

Open EddieMataEwy opened 1 year ago

EddieMataEwy commented 1 year ago

The performance of this repo is already amazing, but I wanted to ask a question. Have you checked the family of improvements defined in this paper? (https://realworld-sdm.github.io/paper/27.pdf) It derives existing algorithms like CFR+ or DCFR by computing "instant updates" to the counterfactual value, the regret and the strategy. I don't know if this would add a lot of complexity to the existing codebase, but it allows, for example, for even faster convergence. This would make CFR+ converge faster than DCFR without worrying about tuning alpha, beta and gamma.

bupticybee commented 1 year ago

No I havn't read the paper, will read it. Sounds promising

xuzy1975 commented 1 year ago

I don't understand the step(5), where to use the instant counterfactual value updated by σt+1?

xuzy1975 commented 1 year ago
   //ICFR要在这用新策略更新payoffs
      const vector<float> current_strategy_new = trainable->getcurrentStrategy();
      fill(payoffs.begin(),payoffs.end(),0);
      //收集数据
      for (int action_id = 0; action_id < actions.size(); action_id++) {
          vector<float>& action_utilities = results[action_id];
          if(action_utilities.empty())
              continue;
          for (int hand_id = 0; hand_id < action_utilities.size(); hand_id++) {
                  float strategy_prob = current_strategy_new[hand_id + action_id * node_player_private_cards.size()];
                  payoffs[hand_id] += strategy_prob * (action_utilities)[hand_id];
          }
      }

add to the end of actionUtility() , it indeed improve performance in some public, such as 6h6c6d, 7d7h2h...

EddieMataEwy commented 1 year ago

I believe you need calculation 5 to proceed with parent node calculations. I don't understand it very well. That is why I opened an issue instead of coding it myself and doing a pull request.

xuzy1975 commented 1 year ago

It seems to need to recalculate payoff use the new strategy, I tried, in some case like banchmark settings, it convergent faster, but in large scale game ,it works worse, maybe somewhere I misunderstood.