WillFlame14 / hanabi-bot

A bot that plays on the hanab.live interface.
GNU General Public License v3.0
14 stars 9 forks source link

Allow inferring known bluff connections. #277

Closed flackr closed 3 weeks ago

flackr commented 3 weeks ago

Previously bluffs were only recognized when the card could match the assumed identity but special bluffs allows for cases where the card cannot be the promised one. Fixes #274.

WillFlame14 commented 3 weeks ago

Hmm, I'm unsure if I want to add this as a valid interpretation. The Self Colour Bluff is a max-level convention, and I would be hesitant to interpret it over assuming some kind of mistake occurred. I think in #274, I would rather have robot2 not think green is a valid bluff instead. At least in my experience, a Self Colour Bluff has never worked without prior agreement and discussion beforehand.

flackr commented 3 weeks ago

Even if we set aside colour bluffs, most people I play with will not hesitate to bluff out an expected "3" even if the finesse slot has negative 3 info on it. At the limit though this is basically the same thing - the game will be telling you it can't be the card you're expecting to play.

Players also are expected to recognize the difference between having been possibly bluffed or finessed in order to know whether they are allowed to defer: https://hanabi.github.io/level-15#a-table-for-deferring-bluffs so it seems a small jump from having to recognize when you might be bluffed to knowing that the card you've been told to play might not be what you expect it to be.

I wouldn't be too sad if we decide the bot shouldn't give these bluffs but I do think it should at least allow others to give them and interpret them correctly. I'm not sure if this actually rules any other clues out to allow it as cases where a finesse can be seen are already invalid bluffs.

For a similar example where I think the other players should be allowed more leniency to reduce the complexity for human play, see #269

WillFlame14 commented 3 weeks ago

The Known Bluff is only introduced at level 13, and the level 11 Self-Bluff section also says that Self-Bluffs can only be done with rank. I'm hesitant to even allow interpreting others giving SCBs because of the potential confusion with existing clues like Fill-In Stalls, Fix Clues or even unconventional clues like Fake Saves that have to be given in the "wrong" order. Usually, a 1-for-1 clue can be given to clearly get the finesse position instead, and the slight loss of additional empathy is rarely relevant outside of 2p.

Identifying whether you're Finessed or Bluffed seems orthogonal to determining whether your card can match or not; if you're in bluff seat and need to play into a valid bluff target, I think you always could be playing into a Bluff, so you can never defer. If the card empathy doesn't match, you additionally know it can't be a Finesse, but that's not relevant to deferring.

At most, we could add a special toggle for SCBs, but I don't think they should be on by default at level 11.

flackr commented 3 weeks ago

If the card empathy doesn't match, you additionally know it can't be a Finesse

Something like this PR is necessary in order to recognize this bluff, because right now we determine that it can't be the connecting card and discount the possibility.

but that's not relevant to deferring.

Right, you must treat the possibility of a bluff as not being allowed to defer, but it still implies that players are expected to know when they're in a possible bluff situation, and if they determine that they are in a possible bluff situation then it doesn't seem like much of a stretch to recognize that the finessed card may not match the identity you expect to be playing.

Self color bluffs aside, which could be a separate option later, we should still recognize known bluffs. We can specifically rule self color bluffs out early on when we're finding the connection or in resolve_bluff.

flackr commented 3 weeks ago

Note that I have a followon to fix the way that we handle recognizing bluffs in rewinds which is mostly working but introduced a failure in another test that seems to be legitimate.

flackr commented 3 weeks ago

I'm hesitant to even allow interpreting others giving SCBs because of the potential confusion with existing clues like Fill-In Stalls,

The stalling rules seem clear that unless other play clue can be seen, stalls are to be assumed over something fancy like a finesse or bluff. I.e. the recipient shouldn't assume a finesse / bluff unless they can see direct play clues that can be given.

Fix Clues

But fix clues are interpreted as fix clues well before we look for bluffs / finesses aren't they?

or even unconventional clues like Fake Saves that have to be given in the "wrong" order.

Does the bot do this?

Usually, a 1-for-1 clue can be given to clearly get the finesse position instead, and the slight loss of additional empathy is rarely relevant outside of 2p.

I guess the big risk with any case in which we have different interpretation rules for clues given from our own index is that they may not necessarily have the same interpretation from others.

Anyways, I've explicitly disallowed self colour bluffs for now, but added a test for known bluffs which were not recognized prior to this.

WillFlame14 commented 3 weeks ago

The stalling rules seem clear that unless other play clue can be seen, stalls are to be assumed over something fancy like a finesse or bluff. I.e. the recipient shouldn't assume a finesse / bluff unless they can see direct play clues that can be given.

Right, but this is asymmetric (there may be clue targets in your hand that you can't see) and can be ambiguous (the giver might be symmetrically locked). This can be mitigated by improving the bot's understanding of how others interpret everyone's clues, and the recent addition of fake symmetric connections helps a lot, but I just think adding a new interpretation for this kind of clue increases the area of risk in these edge situations. Currently, if a clue that looks like an SCB is given, it's just treated as a known mistake by the receiver, so it's possible that no one will bomb.

But fix clues are interpreted as fix clues well before we look for bluffs / finesses aren't they?

Yeah, but again there is a bit of asymmetry/ambiguity in a fix clue that isn't always apparent (e.g. someone else revealing a duplicated card in someone else's hand while you are unaware). It's pretty easy for a human to recognize these based on other people's actions, but this is hard to create rules for a bot. I think the bot gives okay fix clues now, but there was definitely a period where it was trying to fix cards that other people didn't think needed fixing (or tried to fix in a weird way) and ended up in a spiral of confusion.

Does the bot do this [Fake Saves that have to be given in the "wrong" order]?

No, but humans do, and I think bots should be more lenient on this kind of interpretation. If the bot becomes good enough to give these kinds of clues then it would be able to recognize such situations and interpret them correctly as saves, but I don't plan on adding that any time soon.

I guess this sounds a bit wishy-washy with all the complicated hypotheticals, but I'm just worried about introducing new problems to support a convention in the extras section. Hope you understand :pray:

Note that I have a followon to fix the way that we handle recognizing bluffs in rewinds which is mostly working but introduced a failure in another test that seems to be legitimate.

Do you plan to add this to this PR? Otherwise what's here looks good, I think it can be merged.

flackr commented 3 weeks ago

I guess this sounds a bit wishy-washy with all the complicated hypotheticals, but I'm just worried about introducing new problems to support a convention in the extras section. Hope you understand 🙏

I get it, no worries. We can always add it as an experimental option once we figure out a good pattern for turning on and off particular convention features. I was just initially concerned that you wanted to prevent known bluffs which I think would be likely to be confusing for human players who may not notice that they're giving a known bluff, but as long as it's just self colour bluffs I think it's a fine default limitation.

Do you plan to add this to this PR? Otherwise what's here looks good, I think it can be merged.

https://github.com/WillFlame14/hanabi-bot/pull/277/commits/ee5b69c37d20cf7a2b3456a9447d182c33e2ea8d added the rewind support for bluff recognition by setting the potential bluff bit on the connection. I think I've seen an issue or two that may be fixed by this but I haven't thoroughly tested it.

Happy to merge it and follow up on closing issues / adding tests later. Note that this will likely have a merge conflict with my other PR but I'm happy to fix one or the other up after they land.

WillFlame14 commented 3 weeks ago

Sounds good, thanks!