giove91 / hanabi

A program that plays Hanabi
5 stars 3 forks source link

winrates #2

Open fpvandoorn opened 4 years ago

fpvandoorn commented 4 years ago

I've heard that this is a very strong bot. What are the winrates for this bot for 4 and 5 players? Can it play variants other than the 55 card deck? If so, what are its winrates in the other variants?

giove91 commented 4 years ago

Hi! The strongest version (DeltaHanabi by @Delfad0r) wins approximately 65% of the 5-player games and 37% of the 4-player games. In the next few days I will update the README with more precise numbers. For now only the 55 card deck is implemented (where the 6th color works as the other 5 colors). It shouldn't be too hard to adapt the code to the 50 card deck with only 5 colors.

fpvandoorn commented 4 years ago

Very interesting! Your bot does a lot better on 5 player games, even though that is a lot harder (at least for human conventions) on a 55 card deck. Did you optimize the bot for 5 players?

For comparison: I implemented a hat guessing bot that gets 56% in 5 players and 59% in 4 players. Here is its full winrate table: https://github.com/fpvandoorn/hanabi/blob/master/doc_hat_player.md

giove91 commented 4 years ago

Yes, we optimized it for 5 players! Our (human) conventions are also optimized for 5 players, that's the setup we prefer to play. We also used to play "against" this bot (humans vs. bot on the same deck), a couple of years ago.

Thanks for your link! I will create a table with win rate and average score, also for the standard 50-card deck.

giove91 commented 4 years ago

I updated the README with win rates and average scores over 100k games!

For your hat guessing bot, do you also have the average score? We used to look at that number more than at the win rate. In particular, AlphaHanabi tries to maximize the expected final score, rather than the win probability.

fpvandoorn commented 4 years ago

I added the average score to the page I linked (at least, the amounts of points they are below the perfect score on average).

That is a very impressive 5 player score! I think this is the highest score I've seen in a 5p standard non-cheating bot.

The other bots I know that have very good scores have winrate (avg score): https://github.com/WuTheFWasThat/hanabi.rs 94.01% (24.922) https://arxiv.org/pdf/1912.02318.pdf 95.5% (24.94) (see Table 5, page 13)

and it starts to creep pretty close to my cheating bot, which gets 96.8% (24.961)

giove91 commented 4 years ago

Yes, I was aware that DeltaHanabi has better results (with 5 players) than Facebook's bot! I wonder if/when ML will push the state of the art above hand-crafted bots.

The problem of writing a good "cheating" bot is also very interesting, I have never looked into that. Is your 96.8% (24.961) the state of the art?

fpvandoorn commented 4 years ago

I thought ML already pushed the state of the art :)

In the sense that I don't know any cheating bot that does better, yes, it is state of the art. But it's still pretty simple. You can probably do a bit better than my bot.

fpvandoorn commented 4 years ago

After your comment, I spent some time improving cheating player. There was still quite some room for improvement. There is still room for improvement, but the gains get smaller and smaller. The flowchart is now approx. 2x as long, and it now gets on average 24.965 on standard and 29.517 on black.

If you think the problem of writing a good cheating bot is interesting, here is a puzzle. The following are some reasonably-sounding false statements. Try to figure out a situation where these heuristics are bad:

["you always want to do X" is defined to mean that there is no situation where doing an action other than X is guaranteed better than performing X (in that situation)] ["you never want to do X" is defined to mean that there is no situation where doing action X is guaranteed better than performing any other action (in that situation)]

giove91 commented 4 years ago

Nice set of problems! I think I have solved most of them. I guess the last one is true if it is allowed to discard with 8 clues available (not sure if this is allowed by the official rules).

fpvandoorn commented 4 years ago

Yes, that is correct. In the rulebook I know discarding is not allowed with 8 clues available.