winrates - Githubissues

fpvandoorn commented 4 years ago

I've heard that this is a very strong bot. What are the winrates for this bot for 4 and 5 players? Can it play variants other than the 55 card deck? If so, what are its winrates in the other variants?

giove91 commented 4 years ago

Hi! The strongest version (DeltaHanabi by @Delfad0r) wins approximately 65% of the 5-player games and 37% of the 4-player games. In the next few days I will update the README with more precise numbers. For now only the 55 card deck is implemented (where the 6th color works as the other 5 colors). It shouldn't be too hard to adapt the code to the 50 card deck with only 5 colors.

fpvandoorn commented 4 years ago

Very interesting! Your bot does a lot better on 5 player games, even though that is a lot harder (at least for human conventions) on a 55 card deck. Did you optimize the bot for 5 players?

For comparison: I implemented a hat guessing bot that gets 56% in 5 players and 59% in 4 players. Here is its full winrate table: https://github.com/fpvandoorn/hanabi/blob/master/doc_hat_player.md

giove91 commented 4 years ago

Yes, we optimized it for 5 players! Our (human) conventions are also optimized for 5 players, that's the setup we prefer to play. We also used to play "against" this bot (humans vs. bot on the same deck), a couple of years ago.

Thanks for your link! I will create a table with win rate and average score, also for the standard 50-card deck.

giove91 commented 4 years ago

I updated the README with win rates and average scores over 100k games!

For your hat guessing bot, do you also have the average score? We used to look at that number more than at the win rate. In particular, AlphaHanabi tries to maximize the expected final score, rather than the win probability.

fpvandoorn commented 4 years ago

I added the average score to the page I linked (at least, the amounts of points they are below the perfect score on average).

That is a very impressive 5 player score! I think this is the highest score I've seen in a 5p standard non-cheating bot.

The other bots I know that have very good scores have winrate (avg score): https://github.com/WuTheFWasThat/hanabi.rs 94.01% (24.922) https://arxiv.org/pdf/1912.02318.pdf 95.5% (24.94) (see Table 5, page 13)

and it starts to creep pretty close to my cheating bot, which gets 96.8% (24.961)

giove91 commented 4 years ago

Yes, I was aware that DeltaHanabi has better results (with 5 players) than Facebook's bot! I wonder if/when ML will push the state of the art above hand-crafted bots.

The problem of writing a good "cheating" bot is also very interesting, I have never looked into that. Is your 96.8% (24.961) the state of the art?

fpvandoorn commented 4 years ago

I thought ML already pushed the state of the art :)

In the sense that I don't know any cheating bot that does better, yes, it is state of the art. But it's still pretty simple. You can probably do a bit better than my bot.

fpvandoorn commented 4 years ago

After your comment, I spent some time improving cheating player. There was still quite some room for improvement. There is still room for improvement, but the gains get smaller and smaller. The flowchart is now approx. 2x as long, and it now gets on average 24.965 on standard and 29.517 on black.

If you think the problem of writing a good cheating bot is interesting, here is a puzzle. The following are some reasonably-sounding false statements. Try to figure out a situation where these heuristics are bad:

if you have a playable card, you always want to play a card
if you have a playable card, you never want to discard
if you have a playable card and 7 clues, you never want to discard
if you have a playable 4, and the player with a playable 5 has no other playable cards, you always want to play your 4.
if you want to play, you always want to play a card with the lowest value
if you want to play, you always want to play a card with the lowest value or a 5
if you have a playable yellow 3 and playable blue 4, and someone else has the blue 4, but no other useful cards, then you never want to play your blue 4.
when there are no cards in the deck left it doesn't matter which of your playable cards you play
if the players collectively have at least one copy of each card that still needs to be played, you never want to discard if you have a clue
it never hurts to discard a card that you see in another player's hand
in the early game (when no player has discarded yet), if multiple players have useless cards in their hands, it doesn't matter who of them discards first
you never want to intentionally misplay

["you always want to do X" is defined to mean that there is no situation where doing an action other than X is guaranteed better than performing X (in that situation)] ["you never want to do X" is defined to mean that there is no situation where doing action X is guaranteed better than performing any other action (in that situation)]

giove91 commented 4 years ago

Nice set of problems! I think I have solved most of them. I guess the last one is true if it is allowed to discard with 8 clues available (not sure if this is allowed by the official rules).

fpvandoorn commented 4 years ago

Yes, that is correct. In the rulebook I know discarding is not allowed with 8 clues available.

giove91 / hanabi

winrates #2