datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
http://www.rlcard.org
MIT License
2.91k stars 628 forks source link

Dou Dizhu/PettingZoo action abstraction #228

Open rodrigodelazcano opened 3 years ago

rodrigodelazcano commented 3 years ago

Hello, we have observed that the new rlcard version has made some changes that raises failures for the code in PettingZoo. We have been able to fix some of them, however the Dou Dizhu environment in PettingZoo depended on the previous implementation of action abstraction. Now the action space in the newest version of rlcard includes all the 27472 possible actions. For PettingZoo we'll prefer to keep the action abstraction to maintain a smaller action space. We have thought of different possibilities to fix this. However, our best option will be to make a PR to rlcard remaking the action abstraction for Dou Dizhu. This will go in parallel with the new action features implementation and choosing between one or the other will be defined as a parameter in the config of Dou Dizhu environment.

I would really appreciate some feedback on this issue and if you agree with the purposed PR. Thank you!

rodrigodelazcano commented 3 years ago

@benblack769 @jkterry1 @kaanozdogru

daochenzha commented 3 years ago

@rodrigodelazcano Thanks for the proposal. We decided to remove the abstraction for some reasons.

First, in our latest research https://arxiv.org/abs/2106.06135 we found that abstraction is not needed. The action itself in DouDizhu can be encoded into features. We can instead take the actions as input. We actually found that we can achieve much better performance without abstraction.

Second, the original abstraction is based on very simple heuristics. The abstractions may significantly restrict the upper-bound to the final performance. With this abstraction, the users may at most achieve rule-level performance. However, without abstraction, our new algorithm can be much better than rules and achieve human-level performance.

Thus, we decided to remove the abstraction. As you may already observed, we have supported action features for all the rlcard environments. This is based on our observation that doing action feature extraction could be a better alternative of action abstraction.

To the best of my knowledge, keeping all the actions without abstraction would be the best choice to help users develop strong RL agents for DouDizhu. However, supporting action features is tricky as it is usually not supported in gym-like interface. Is is the reason that you guys wanna keep abstraction?

benblack769 commented 3 years ago

Hi, can you say where in the paper it explains how the action is encoded into features, and how the actions are taken in as input? I am very unclear on this. Does it require specialized code for an RL agent to support this?

daochenzha commented 3 years ago

@benblack769 Basically, there are many possible combinations of cards and this is why the action space of DouDizhu is so large. What we used is a 4*15 matrix to one-hot encode the actions, you can refer to the following figure:

image

This part is also implemented in RLCard, see https://github.com/datamllab/rlcard/blob/master/rlcard/envs/doudizhu.py#L140