datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
http://www.rlcard.org
MIT License
2.86k stars 618 forks source link

doudizhu_nfsp 培训出错 #194

Closed zzz259758 closed 3 years ago

zzz259758 commented 3 years ago

用doudizhu_nfsp 训练一段时间 通常是5分钟后就出现以下错误 INFO - Agent nfsp1_dqn, step 5000, rl-loss: 0.2763792872428894 INFO - Copied model parameters to target network. INFO - Agent nfsp2_dqn, step 5000, rl-loss: 0.2676895558834076 INFO - Copied model parameters to target network. INFO - Agent nfsp0_dqn, step 17416, rl-loss: 0.05982016772031784 进程已结束,退出代码-1073740791 (0xC0000409) 我怀疑是内存的问题 我内存是12个G 培训的时候几乎是满的
一般应该是使用什么样的配置来进行培训?

daochenzha commented 3 years ago

@zzz259758 DQN 是off-policy算法,对内存要求比较高,可以尝试一些off-policy的方法比如PPO A2C

Jragon commented 3 years ago

maybe you can try the pytorch implementation? I know the tensorflow nfsp impl would eat up 500gb of ram for my game. I'm not sure if that is due to a programming error on my part or the agent code. With pytorch it barely uses 3gb after 50000 episodes