关于训练问题 - Githubissues

RuBP17 / AlphaDou

A Doudizhu reinforcement learning AI

GNU General Public License v3.0

4 stars 1 forks source link

关于训练问题 #1

Open rubbyzhang opened 1 week ago

rubbyzhang commented 1 week ago

用什么机器进行训练，训练了多久
和douzero resnet 版本相比，是直接使用resnet git提供的训练模型还是自己重新训练过？
有对比对两种方法收缩的速度吗？

RuBP17 commented 1 week ago

我们在8张4090上训练了3周。
重新训练的。
收敛速度快大约2-4倍。

我们在arxiv上提交了一篇同名文章，如果你对我们方法的具体细节感兴趣，可以阅读该文章。

We trained for three weeks on eight 4090 GPUs.
It was retrained.
The convergence speed was about 2-4 times faster.

We have submitted an article with the same title on arXiv. If you are interested in the specific details of our method, you can read the article.

rubbyzhang commented 4 days ago

谢谢你的回答，请问https://github.com/RuBP17/AlphaDou/tree/main/baseline/SLModel 是论文中训练好的模型吗？还是说需要我自己重新进行训练

RuBP17 commented 4 days ago

不是，SLModel内的模型是用于测试的叫牌模型，它采用监督学习训练而来，是Douzero Resnet项目的叫牌模型。论文中的出牌模型以及叫牌模型均是由强化学习训练而来的。在这里未给出权重文件。

No, the model within the SLModel is a bid model used for testing, which was trained through supervised learning and is the bid model from the Douzero Resnet project. The cardplay model and the bid model mentioned in the paper were both trained through reinforcement learning. The weight files are not provided here.

EdwardPooh commented 4 days ago

@rubbyzhang

SLModel中的模型是使用Douzero Resnet监督学习得到的叫牌文件，使用BidModel.py可以加载bid_weights_new.pkl评估手牌期望分数，并通过设定阈值叫牌，可以在evaluate.py中设置"Supervised"来体验。
landlord_weights.pkl是”在三张地主牌加入地主手牌后”用来评估20张手牌的期望的权重，可以使用LandlordModel.py来加载模型。
landlord_down_weights_new.pkl和landlord_up_weights_new.pkl分别是“在三张牌加入地主手牌后“用来评估地主上家和地主下家17张手牌的期望的权重，想要加载可以使用FarmerModel.py文件来加载。

rubbyzhang commented 2 days ago

谢谢大佬的详细解答