WillFlame14 / hanabi-bot

A bot that plays on the hanab.live interface.
GNU General Public License v3.0
14 stars 9 forks source link

Allow deferring a finesse while finessed. #224

Closed flackr closed 1 month ago

flackr commented 1 month ago

This implements a crude approximation of the urgency principle in level 5 for issue #195 A player is allowed to defer playing a finesse to

The additional requirements in parentheses are not verified/required yet.

* technically it only verifies that one of the touched cards is currently interpreted as a save, which doesn't necessarily mean that it wasn't already a save before.

flackr commented 1 month ago

Hey, this is still rough, I think the "right" way to do this would be to store somewhere whether the clue is deemed "critical", which should be calculated at the point of interpreting the clue. Just getting this started and welcome any thoughts on where / how best to determine whether the conditions are met.

WillFlame14 commented 1 month ago

Your approach seems fine so far (check in update-turn whether the delaying player gave a clue that matches the criteria of a save, or created a new waiting connection containing a finesse component which involves the next player), but I would advise caution before going too far. I tried implementing this back when I first added level 5, but the possibility of delaying finesses made some games result in really uncomfortable situations with heavy desync. I think it's extremely important to play (and expect others to play) into finesses immediately, especially if the clues were rank clues. Otherwise, the receiver has no idea what is occurring and will be extremely limited in their actions, and the other players will struggle to understand their clues.

For example, suppose the following hypo: image

Donald thinks Cathy might be finessed for him, but she gives a rank clue to him on her turn instead of playing into it. From Donald's perspective, this could be p4 play, or a b4 Triple Finesse. On Donald's turn, he is locked because he cannot play anything, so he must give a stall clue.

From Alice and Bob's perspective, Donald has no reason to stall. Cathy just demonstrated that Donald has g1 by giving a p4 play clue on her turn, instead of playing g1 finesse. (How could Alice/Bob know that they have b1/b3 on finesse position?)

From Cathy's perspective, Donald also has no reason to stall. Bob clued a direct play on g1 to Donald, so he has a playable, even though she knows Donald could symmetrically think her own clue was a b4 finesse. (How could Cathy know that she has g1 on her finesse position?)

Donald cannot differentiate this situation with the other where Cathy is actually finessed and delays to give a b4 Triple Finesse, and neither can anyone else. It's not even clear that Bob and Cathy gave bad clues on their turns: Bob wants to save g3, and Cathy wants to get the purple suit finished.


This example is kind of contrived, but I don't think it takes a lot of factors to produce something along the same lines. If Cathy is simply expected to play into the g1 finesse on her turn, we never have this problem. We just lose some potential efficiency from missed finesse opportunities, some of which will be removed at level 11 anyways since Guide Principle overrides Urgency Principle when a bluff could be occurring. Unless you have ideas on how to tackle situations like these (or have an argument for why these don't occur or should just fail), I'm not sure it's worth the added confusion/complexity.

I would guess that human games where such situations occur typically result in a bomb (with possible resynchronization after), but trying to understand bombs is even more of a rabbit hole that I don't want to get into. :sweat:

As an aside: card.saved only denotes whether a card has been prevented from being discarded (i.e. it's finessed, clued or chop moved). I currently don't track whether a clue is interpreted as a save or not, though maybe card.chop_when_first_clued is pretty close.

flackr commented 1 month ago

Your approach seems fine so far (check in update-turn whether the delaying player gave a clue that matches the criteria of a save, or created a new waiting connection containing a finesse component which involves the next player), but I would advise caution before going too far.

Thanks, I appreciate the context and recognize now that this is no easy task to get right. I do still think it is a good idea (it's in the convention for a reason after all), but we'll need to consider lots of edge cases.

I tried implementing this back when I first added level 5, but the possibility of delaying finesses made some games result in really uncomfortable situations with heavy desync. I think it's extremely important to play (and expect others to play) into finesses immediately, especially if the clues were rank clues. Otherwise, the receiver has no idea what is occurring and will be extremely limited in their actions, and the other players will struggle to understand their clues.

For example, suppose the following hypo: image

Donald thinks Cathy might be finessed for him, but she gives a rank clue to him on her turn instead of playing into it. From Donald's perspective, this could be p4 play, or a b4 Triple Finesse. On Donald's turn, he is locked because he cannot play anything, so he must give a stall clue.

From Alice and Bob's perspective, Donald has no reason to stall. Cathy just demonstrated that Donald has g1 by giving a p4 play clue on her turn, instead of playing g1 finesse. (How could Alice/Bob know that they have b1/b3 on finesse position?)

From Cathy's perspective, Donald also has no reason to stall. Bob clued a direct play on g1 to Donald, so he has a playable, even though she knows Donald could symmetrically think her own clue was a b4 finesse. (How could Cathy know that she has g1 on her finesse position?)

Donald cannot differentiate this situation with the other where Cathy is actually finessed and delays to give a b4 Triple Finesse, and neither can anyone else. It's not even clear that Bob and Cathy gave bad clues on their turns: Bob wants to save g3, and Cathy wants to get the purple suit finished.

This example is kind of contrived, but I don't think it takes a lot of factors to produce something along the same lines. If Cathy is simply expected to play into the g1 finesse on her turn, we never have this problem.

Right, I understand, the information can be very asymmetrical in these situations. I suppose the other players would have to use the symmetric connections to determine Donald may think there could be a finesse through their hands?

We just lose some potential efficiency from missed finesse opportunities, some of which will be removed at level 11 anyways since Guide Principle overrides Urgency Principle when a bluff could be occurring. Unless you have ideas on how to tackle situations like these (or have an argument for why these don't occur or should just fail), I'm not sure it's worth the added confusion/complexity.

For some reason deferring bluffs isn't mentioned until level 15, but yeah, with bluffs turned on I would expect the bot to pretty much never allow defer playing into a potential bluff (maybe just critical saves which should be easier for other players to recognize), since any card you could be queueing might duplicate your bluffed card and in normal human games the recipient doesn't always notice that a bluff is possible and that they might expect you to play your draw slot.

I would guess that human games where such situations occur typically result in a bomb (with possible resynchronization after), but trying to understand bombs is even more of a rabbit hole that I don't want to get into. 😓

Right, the combination of it not allowing deferring a finesse as well as not understanding bombs often leads to multiple bombs in human games.

As an aside: card.saved only denotes whether a card has been prevented from being discarded (i.e. it's finessed, clued or chop moved). I currently don't track whether a clue is interpreted as a save or not, though maybe card.chop_when_first_clued is pretty close.

Yeah, what I meant about moving it into the interpreting a clue code is that it's there where the code determines whether it should be read as a save clue. I was thinking of saving something on the state about the clue that update-turn would be able to read rather than need to redetermine whether the conditions were met.

flackr commented 1 month ago

Wow, that is quite the example! I guess Cathy would also need to consider if Donald may see a possible finesse through their hand that a clue should only be given if the default discard is safe (or they have something more important to do like a critical save). Of course that reduces clue opportunities, but hopefully it's a rare enough.

flackr commented 1 month ago

So I think ideally what we want is when Bob tells Donald about the 2 green cards, Cathy should check whether it's possible for Donald to think that Cathy has a connecting card. Then Cathy could write the same notes as all of the other players would/should, g1,g2 and add a waiting_connection on herself (note that other players wouldn't have the same note if Cathy's finesse wasn't g1, but this is okay).

Then this waiting connection would be taken into account when determining whether giving the 4 clue would be safe - which it turns out is not safe - because in the hypothetical game Donald would still be waiting for the connection.

flackr commented 1 month ago

At the limit, for every card in the connection which could match our finesse position, that player if they're after us in play order since the card became playable could be waiting to see if we play.

Right now the code is written such that if we find the finesse in cards we can see, we don't even consider that we could have the card as well until they don't play. I feel like this in theory would need to change so that we consider that even if we can see the card we assume we could also have the corresponding card (if we would have a play opportunity before that player) and I guess adjust their inferred information accordingly. This would allow us to know for every player whether they are actually loaded. Note that when considering giving a clue we'd then have to consider if the player needing to be loaded can tell from the clue that it is direct or a finesse.

Alternately, it would be nice if we only had to do extra work when considering giving a clue. I guess the loaded check could assume that for any card we think is clued playable, if the possibilities allows for the next card and our corresponding finesse slot / prompt card when that was clued could match and is not otherwise clued then the player could think they are waiting on us.

WillFlame14 commented 1 month ago

So I think ideally what we want is when Bob tells Donald about the 2 green cards, Cathy should check whether it's possible for Donald to think that Cathy has a connecting card. Then Cathy could write the same notes as all of the other players would/should, g1,g2 and add a waiting_connection on herself (note that other players wouldn't have the same note if Cathy's finesse wasn't g1, but this is okay).

Right, these would be like "fake" symmetric connections where you may need to blind play more cards than should be necessary from your point of view. I recently just removed fake waiting connections actually, since I realized I never used them and had a bunch of useless !fake checks, but it seems to be quite close to what you're describing.

While we can prevent bot Cathy from giving such clues, we still need a way to interpret the situation if human Cathy gives it. If everyone takes the conservative finesse approach mentioned above, everyone can be synced by understanding that Donald could be locked through possible symmetric finesses. This may prevent Donald from giving play clues to chop (they may look like Locked Hand Saves), but that seems ok in this scenario. At least, this is one option, another is making simplifying assumptions such as Donald being allowed to assume he is not locked by playing either g1 or p4.

If we take the conservative approach, however, it seems likely that it will be confusing to humans: normal symmetric connections are fine, but I'm not sure anyone regularly plays by considering fake symmetric connections on themselves. There might appear to be very clear play clues that the bot avoids (e.g. from Alice's pov, Cathy's 4 clue looks quite normal), or the bots might stall very frequently due to "symmetrically-locked" hands that aren't actually locked. I guess it might not be too useful to speculate on what might happen though, until we actually see it.

flackr commented 1 month ago

If we take the conservative approach, however, it seems likely that it will be confusing to humans: normal symmetric connections are fine, but I'm not sure anyone regularly plays by considering fake symmetric connections on themselves.

I think this kind of mistake happens in human games due to people not considering this. If the bot discards the critical 5 as a result I think that is the human's fault for not thinking about this and they'll realize it when watching the replay.

flackr commented 1 month ago

If we take the conservative approach, however, it seems likely that it will be confusing to humans: normal symmetric connections are fine, but I'm not sure anyone regularly plays by considering fake symmetric connections on themselves.

I think this kind of mistake happens in human games due to people not considering this. If the bot discards the critical 5 as a result I think that is the human's fault for not thinking about this and they'll realize it when watching the replay.

Note that most of the time you won't have the fake connection and things will go fine, but it is always a risk.

I think the bot should be able to detect cases where the recipient can determine it's a direct play clue being given and still give those, and of course usually there won't be a critical card on chop. I'm hoping/expecting the combination of these things to be sufficiently rare that avoiding clues is rare.

flackr commented 1 month ago

I've added a version of this idea, where for every focus possibility / connection we track whether we may have the card as well. This state is cleared after we play or discard. I need to look into why some of the existing tests are still failling and why it failed to account for the b4 connection by number but I'm hoping this is headed in a good direction.

flackr commented 1 month ago

So this is now passing all tests, though notably I had to modify a couple. I'd be curious to get your thoughts. I'll also try playing with it for a bit to see how things go.

Two things that aren't done yet:

  1. The bot will not never playing into a finesse yet. I think ideally it would at least save critical cards and possibly give high value finesses (TBD what is high value enough).
  2. It should only consider it a valid delay if nobody else could have given the clue. This would allow the 5 save in test/h-group/level-5/clandestine-finesses.js to be treated as not delaying the connection.

Edit: Both things are now done!!

flackr commented 1 month ago

Now I'm tracking when interpreting the clue whether it qualifies as an "important" clue, meaning that it's a save or finesse that could not have been given by a later player. This should reduce the number of cases where the later player defers playing.

flackr commented 1 month ago

Well that wasn't as hard as I thought. It's now allowed to give critical clues. Note there's currently a disconnect between the actions it will take while finessed and what it recognizes as a finesse delay. I guess ideally we would call the same underlying function to determine whether the action was among those urgent enough to be done ahead of playing into the finesse.

flackr commented 1 month ago

For now I've made it so that it only delays finesses for saves, which should be recognized on the other end as a critical save. I do think that we should try to have a function which recognizes whether an action taken by someone else appears urgent, if you don't already have one.

WillFlame14 commented 1 month ago

This looks like it's coming together nicely, I hope your games with it haven't been too bad! This is a different approach to what I had in mind (creating waiting connections with a special property that we would have to check, rather than writing on the cards directly) but this seems quite clean, actually.

I think we'll have to work this in together with Early Game (e.g. I would strongly lean away from allowing defers when the reacting player gives an Early Save in Early Game), but that should come around once I implement Early Game behaviour in the first place. :stuck_out_tongue:

flackr commented 1 month ago

Thanks for reviewing, greatly appreciate your comments and apologies for some of the issues from moving code around. I've had to figure out where best to implement things as I've worked through this issue. I still want to eventaully have a function which detects urgent actions directly from an action so that we can support other urgent actions, but already this seems to be working pretty well.

This looks like it's coming together nicely, I hope your games with it haven't been too bad!

It's much better. I had so many cases where humans didn't know that the bot didn't understand this, and it's so tempting to give valuable finesses, that would inevitably lead to bot faults. :)

This is a different approach to what I had in mind (creating waiting connections with a special property that we would have to check, rather than writing on the cards directly) but this seems quite clean, actually.

Yeah I was originally thinking this too, but I realized that a flag on the card is sufficient and so much simpler to manage.

flackr commented 1 month ago

In terms of ensuring overall correctness, I think the one part of this that could still use a bit of work is a common framework for identifying recognizable urgent actions. I'm concerned that the current logic may be assymetrical in some situations, e.g. there are likely some situations or some actions that find_urgent_actions finds that interpret_clue won't recognize as urgent or vice versa.

WillFlame14 commented 1 month ago

Yes, a longstanding issue with the bot is that a lot of its features are not symmetric, which can lead to some confusing situations. The current "urgent action" processing also uses a pretty rough heuristic (around line 139 of urgent-actions.js). We assume that a save is high priority if:

This means that save clues to the next player are always high priority (unless Early Game related). High priority saves (index < actionPrioritySize) take priority over giving high value clues and playing cards, but I previously considered playing into a finesse to be even higher priority than this. That's why both the high-priority algorithm and clue-safe algorithm take being finessed into account.

If you a better idea of how to tell whether a save is urgent or not, we could definitely try that. Otherwise, I guess you could pull the small algorithm out to check if a high priority save was needed to be given to that player.

flackr commented 1 month ago

I think a little asymmetry in this feature is okay. As long as every finesse deferring save given by take-action.js is recognized, it's okay if on the receiving end it's a bit more lenient to allow human players to be less stringent than the bot when deferring finesses.

flackr commented 1 month ago

I think this may be at a point where it works well enough, and we can always iterate on the edge cases. It right now only permits a finessed player to defer by saving the next player or giving a finesse that noone else could have given.

WillFlame14 commented 1 month ago

Hmm, including fake symmetric connections does seem to be throwing things off. In seed 68 (played at level 7), a dilemma forms from Donald's perspective: image

On turn 6, the 3 clue to Cathy is symmetrically ambiguous. If we (Donald) have b2 in slot 3, then it could be b3 as a Prompt on our slot 3, or g3 direct. When Cathy clues 4 to us on turn 8,

Should we play slot 3 or not? Is this Cathy's fault for giving an ambiguous clue (so now we have to model what others could be trying to symmetrically interpret clues to us...)? There might be a line of logic proving that Cathy cannot be cluing b4, but I can't seem to think of it right now.

flackr commented 1 month ago

In seed 68

How do you play specific seeds?

When the clue is first given, I think either g4 or b4 are possibilities.

Should we play slot 3 or not?

I don't think so. I think as long as Cathy didn't know what her card was yet, per Occam's razor the simpler explanation is that our first 2 connects to the 4. Cathy can't know whether we were going to play the other 2 or not, and should be able to recognize that a b4 would be an unresolvable ambiguity (i.e. we need to play a different card for different interpretations)

flackr commented 1 month ago

If the ambiguity had an earlier connection from another player then the ambiguity is resolvable, e.g. similar to the case with a possible save or finesse play.

WillFlame14 commented 1 month ago

The self-play command lets you play specific seeds (e.g. npm run self-play -- games=1 players=4 level=7 seed=68) that you can then view using replay (e.g. npm run replay -- file=seeds/68.json level=7).

When the clue is first given, I think either g4 or b4 are possibilities.

Doesn't this conflict with saying that we should discount the blue interpretation due to being more complicated? If we allow a b4 possibility, then we might need to play slot 3. If we don't plan on playing slot 3, then we shouldn't think it can be b4. (We already know slot 1 is g2 because Bob played into the Reverse Finesse.)

Cathy can't know whether we were going to play the other 2 or not, and should be able to recognize that a b4 would be an unresolvable ambiguity (i.e. we need to play a different card for different interpretations)

Right, Cathy should know that cluing b4 would be ambiguous, but Cathy doesn't think so in the current version of the code because she doesn't model what Donald thinks of clues to her. When interpreting clues to us, we don't look for symmetric connections that other people think we could be interpreting, so we can't tell that Donald thinks that we could think our 3 is blue.

Essentially, it isn't common knowledge yet that Cathy's 3 is green-- it's only common that it's green or blue, but everyone knows it's green, but Donald doesn't know that Cathy knows it's green. If we allow delayed information, then Cathy can resolve this ambiguity by playing g3 on her turn (demonstrating her private knowledge) or not.

But your point about Occam's Razor is right: both in the case where Cathy knows g3 and she doesn't, g2 (known, Donald) -> g3 (self-finesse, Donald or known, Cathy) -> g4 [1 blind play] is simpler than b2 (prompt, Donald) -> b3 (finesse, Alice) -> b4 [1 blind play, 1 prompt] since it has less Prompts. So I guess this is more revealing of a potential common knowledge problem that's probably beyond the scope of this PR, and not really related to fake connections in general.

flackr commented 1 month ago

Doesn't this conflict with saying that we should discount the blue interpretation due to being more complicated? If we allow a b4 possibility, then we might need to play slot 3. If we don't plan on playing slot 3, then we shouldn't think it can be b4. (We already know slot 1 is g2 because Bob played into the Reverse Finesse.)

Sorry, my point was that the b3 finesse can be done if our slot 1 is the b2. If we play it, then either Alice plays b3 (if Cathy's 3 isn't blue) or Alice doesn't play and Cathy knows to play b3. Either way, it doesn't rely on Donald figuring out what to play based on uncertain information.

However, given out slot 1 is g2, then as you've pointed out Donald can't know whether to play slot 3 before playing slot 2 or not.

WillFlame14 commented 1 month ago

This seems relatively complete then, thanks for your help! This fixes #127.