datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
http://www.rlcard.org
MIT License
2.86k stars 618 forks source link

Doudizhu #154

Closed Cristal-yin closed 3 years ago

Cristal-yin commented 4 years ago

any news on doudizhu optimizations?

Cristal-yin commented 4 years ago

Hello author, I am a domestic graduate student. Recently I am planning to graduate and plan to improve the algorithm of the game class. Since I have just been getting started, I have been focusing on your project. I want to ask if this project can be used on actor-critic algorithm(such as A3c or DDPG) ,i plan to use your platform and another algorithm and then improve it ?

daochenzha commented 4 years ago

@Cristal-yin Thanks for your interest. We do not plan to further optimize the speed of Dou Dizhu for now since we may have to implement it with C++ for further optimization.

Yes, it is possible to connect the game engine to A3C. But it may require some efforts since A3C is a multi-process algorithm. What I have in mind is to use the single-agent mode in RLCard, where the interfaces would be openAI gym like with the other agents as rule-based models. See https://github.com/datamllab/rlcard#api-cheat-sheet

Doudizhu is a challenging game. It may be difficult to train RL from scratch. To make the training feasible, I would recommend first generating training data using our rule model in https://github.com/datamllab/rlcard/blob/master/rlcard/models/doudizhu_rule_models.py

Then use supervised learning (SL) to train the agent. After SL stage, we then continue training with RL. This should be easier than using pure RL

Also, we may need better neural architecture such as CNN and LSTM (currently it is just MLP)

Cristal-yin commented 4 years ago

Thank you very much for your answer😊

daochenzha commented 3 years ago

Strong DouDizhu agent is suoported at https://github.com/datamllab/rlcard/tree/master/rlcard/agents/dmc_agent

It is also supported at https://github.com/kwai/DouZero