WillFlame14 / hanabi-bot

A bot that plays on the hanab.live interface.
GNU General Public License v3.0
14 stars 9 forks source link

Save clues targeting finesse positions for players with more valuable cards on chop #312

Closed flackr closed 1 day ago

flackr commented 1 week ago

If the focus of a clue is the finesse position, it's worth considering deferring the clue to someone else who may be able to see the card that follows it. See #214 for some discussion on this idea.

WillFlame14 commented 1 week ago

The evaluation seems a bit out of place, needing to modify determine_focus() and ClueAction despite not being really related to either of them. I think it would be better done in clue_finder(), for example, since we only care about this when finding clues given by us. Is there a reason it needs to be in clue interpretation?

Also, I think this should be weighted toward good chop discards rather than just letting other people finesse: aside from dark variants, it's much more likely to lose the game from a bottom-deck risk compared to not having enough efficiency. I'd much rather have the bot give 1 for 1s for the entire game (which is efficient enough to win) than discard trying to allow a finesse and losing some 3 or 4 as a bdr. My half-formed idea for this is to only allow early discarding if [the average value of your chop < the chop of a player that would discard if you gave this clue], but this is also affected by context like others discarding early or stealing clues.

flackr commented 1 week ago

I think it would be better done in clue_finder(), for example, since we only care about this when finding clues given by us. Is there a reason it needs to be in clue interpretation?

AFAICT, we don't keep the clue-specific connections once we return from interpret_clue. However, I realize now it would be much cleaner to return the information I need from interpret_clue. I think I could then also remove the important flag I am adding there and determine the importance by returning the connections.

flackr commented 1 week ago

Regarding valuation, I agree that we should also consider cards on chop, though I wouldn't discount the value of clue efficiency. Since any strategy is "correct" to convention, perhaps we could evaluate using self play to figure out what works best? I've had so many games where the bots give many inefficient clues resulting in us being down to 0 or 1 clues and losing indirectly because of that. It may also be a good idea to switch strategies based on pace and number of clues.

flackr commented 1 week ago

We could check if we can see a possible bluff, that could be given and consider that a reason to delay. Also if one person saw the direct play and didn't finesse it, then it means that the finesse likely isn't there (or touches our draw slot and they were deferring to see if someone could do even better).

WillFlame14 commented 1 week ago

AFAICT, we don't keep the clue-specific connections once we return from interpret_clue. However, I realize now it would be much cleaner to return the information I need from interpret_clue.

Do we need the connections? It seems like they're only used to figure out who can also give this clue. In get_result() we compute the players who received a new playable from the clue; isn't it just the complement of them? At most, I think this will additionally exclude players who have a known component as part of the clue, but that can be checked by seeing if the inferences on the card pre-hypo already matched the identity and allowing them to be givers too.

If you want to have particular logic for finesses (e.g. a player couldn't give it if a card could move out of position before their turn), any set of connections that contains at least one finesse will be added as part of a waiting connection, so you can retrieve them from there.

Regarding valuation, I agree that we should also consider cards on chop, though I wouldn't discount the value of clue efficiency. Since any strategy is "correct" to convention, perhaps we could evaluate using self play to figure out what works best? I've had so many games where the bots give many inefficient clues resulting in us being down to 0 or 1 clues and losing indirectly because of that. It may also be a good idea to switch strategies based on pace and number of clues.

Sure, I think self-play performance can be a good metric, but I also think the bots currently have very poor look-ahead (i.e. they're too greedy) which is a confounding factor. Being at low clues and unable to navigate situations properly is moreso a symptom of not letting the right players discard rather than not giving enough finesses. Looking at recent will-bot games in No Variant/Rainbow, almost all of them have the future required efficiency around 0.7 or 0.8 by the midgame (even less by the endgame), which means that it isn't the issue. It seems from your history that you've been playing a lot of Black, which is a dark variant and requires much more aggressive discarding.

Any change that increases discarding might improve performance slightly due to sustaining a safer clue count, I just don't think it's the finesse stealing that's causing the problem.

We could check if we can see a possible bluff, that could be given and consider that a reason to delay.

This is a reason to not clue, but I don't think it's necessarily a good reason to discard (especially if the player who would give the bluff has trash on chop, for example). Playing any card is already higher priority than giving any clue that isn't a finesse, except for save-like clues like TCCMs. So this shouldn't be causing any tempo issues, it only affects who performs the next discard.

flackr commented 1 week ago

I recognize the concerns, thank you for patiently explaining the rationale here. I don't like the idea of always assuming that our own chop is valuable and shouldn't be discarded, but perhaps your idea earlier helps solve that

My half-formed idea for this is to only allow early discarding if [the average value of your chop < the chop of a player that would discard if you gave this clue], but this is also affected by context like others discarding early or stealing clues.

We can determine for a particular clue who could give it (or could have given it before us) for a particular clue. If someone before us could have given the clue and didn't, and we don't see a valuable chop card after us, then we can assume it's us. I think your half-formed idea does this implicitly since as you say we won't see a chop after us more valuable than the average of our own. This seems like a relatively straightforward thing to calculate.

One other thing I don't like about the current solution is that it avoids giving the direct clue by lowering the value below the minimum clue threshold, but what it should probably do is ensure that it saves a clue for the designated player to give it, and only give a different clue if it doesn't interfere with that clue.

flackr commented 1 week ago

I do feel like discarding should still be a perfectly fine option if we are looking at cards on chop since someone has to be discarding. There are likely subtle cues we could listen to for determining the value of our chop (e.g. with this implemented any time we are clued past our chop is probably not as valuable).

flackr commented 6 days ago

Okay, this is updated to save a finesse slot clue if there is a player who we believe can give the clue and has a higher value card.

WillFlame14 commented 1 day ago

Looks good, thanks!