alan-turing-institute / reg-peg

REG Hack Week 2024 Cribbage Agent Project
MIT License
0 stars 0 forks source link

Board Position #4

Open edaub opened 1 month ago

edaub commented 1 month ago

Skilled human players adjust their strategy based on the relative position of both players on the board. If a player has an advantage, they will tend to play more conservatively, and if the player is behind they will take risks, either offensive (take risks to gain more points) or defensive (take risks to limit opponent points at the expense of their own scoring), depending on what they feel gives them the best chance to regain an advantage.

A player has an advantage if, on average, the outcomes of future hands would get them to the end of the game before their opponent. Thus, one can imagine simulating forward in time from a given position to determine who is most likely to win, and then making strategic decisions that might alter the new relative positioning. Thus, one way to approach modeling this is to have a probabilistic model of a typical hand outcome for each player (pone and dealer) which can be run forward in time to compute win probabilities. This is essentially the same as a RL value function estimation for a given state based on what a particular policy is likely to produce in terms of points, so could be done using a number of possible methods.

Human players tend to use a heuristic for this value function estimation, where knowing average scoring from historic tournament play, they can work backwards from the final score to know approximately how many hands are expected to remain and thus who has the advantage in this case. Typical values for this are 10 points for pone (hand + pegging) and 16 points for dealer (nobs + pegging + hand + crib), with the assumption that one attempts to finish the game after the pone counts their hand ahead of the dealer counting their hand and crib. Using this heuristic, one can work backwards and figure out guide holes that one would like to reach ahead on particular hands, and when a particularly high scoring or low scoring hand occurs this may quickly shift the advantage and require a change in strategy.