datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
http://www.rlcard.org
MIT License
2.91k stars 630 forks source link

How could i use deepcfr agent in games likes doudizhu? #38

Closed loserZhang closed 3 years ago

loserZhang commented 4 years ago

You have implemented deep_cfr algorithm in your code, but there is not an example for it.

daochenzha commented 4 years ago

@loserZhang The example should be almost the same as CFR in https://github.com/datamllab/rlcard/blob/master/examples/leduc_holdem_cfr.py, except that we need to input a tf.Session during initialization.

The structure of DeepCFR is very similar to CFR, but with neural networks as function approximators. For the current DeepCFR, we have tried hard but still could not make it converged. Thus, we did not include it as an example.

If you are interested, you may try different hyperparameters/networks of DeepCFR and let me know if you successfully make it converged :)

loserZhang commented 4 years ago

@daochenzha Thank you very much. I have another question that when I combine nfsp to multi-process, the variable self._reservoir_buffer is always 0

daochenzha commented 4 years ago

@loserZhang Thank you for letting us know. It seems not normal. Currently, we only provide an example of parallelization with DQN. You may encounter some bugs when combing multi-process with nfsp.

Do you think it is a good idea to implement a general wrapper for parallelization? We may implement this function in the future.

loserZhang commented 4 years ago

Thank you, it is my mistake when combining this together, and i have solved it. Do you have some idea of combining mcts with nfsp together? reference: https://arxiv.org/pdf/1903.09569.pdf

lhenry15 commented 4 years ago

Thanks for letting us know. Monte Carlo Tree Search with fictitious self-play for imperfect information game seems to be a promising direction. But dealing with large state/action space is still challenging. Maybe some abstraction techniques would help, such as the following recent papers: [1] https://www.ijcai.org/Proceedings/15/Papers/084.pdf [2] https://ieeexplore.ieee.org/abstract/document/8848034 [3] http://www.csse.uwa.edu.au/cig08/Proceedings/papers/8057.pdf [4] https://core.ac.uk/download/pdf/82710979.pdf