bupticybee / AlphaNLHoldem

An unoffical implementation of AlphaHoldem. 1v1 nl-holdem AI.
GNU Affero General Public License v3.0
61 stars 15 forks source link

How do you judge/track the convergency of the holdem model? #7

Open Josh00-Lu opened 6 months ago

Josh00-Lu commented 6 months ago

Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge

Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?

I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.

bupticybee commented 6 months ago

Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge

Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?

I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.

No good way from my pov, you can try train one and see it's performance vs slumbot, but this simulator don't matches the rule of ACPC. You need to modify the simulator to do that.