In Mahjong game prediction, it appears that the order of state['current_hand'] influences the result of eval_step, what could be the reason?

datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.

http://www.rlcard.org

MIT License

2.87k stars 619 forks source link

In Mahjong game prediction, it appears that the order of state['current_hand'] influences the result of eval_step, what could be the reason? #310

Closed jacy closed 5 months ago

jacy commented 8 months ago

found the root cause: in mahjong extract_state function the raw_legal_actions and legal_actions doesn't match, legal_actions is the unique list of player's hand, but raw_legal_actions is the list of player's hand