google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.2k stars 930 forks source link

Backgammon doubling cube and matches #67

Closed jamesdfrost closed 4 years ago

jamesdfrost commented 5 years ago

I'd like to take a look at implementation of doubling cube and match logic for Backgammon.

The doubling cube is critical to backgammon strategy, and previous Backgammon value networks such as TDGammon have not implemented this within the neural network - it's been traditionally bolted on via a separate doubling algorithm. I think this has the potential to be an area where this framework could offer significant improvements over previous works.

The way I would see it working would be:

For this to work, the backgammon State will need to include:

Would also need to have a way of passing the following options to backgammon.cc :

Would be interested in views on this - probably won't be looking at this for a few weeks (assuming you're happy to support this), but keen to get initial thoughts down.

elkhrt commented 5 years ago

I think this is a really interesting area, and would be excited to see work in this direction.

I'd recommend starting with money games, which are conceptually much simpler, but have similar dynamics.

On Sun, 22 Sep 2019, 14:49 jamesdfrost, notifications@github.com wrote:

I'd like to take a look at implementation of doubling cube and match logic for Backgammon.

The doubling cube is critical to backgammon strategy, and previous Backgammon value networks such as TDGammon have not implemented this within the neural network - it's been traditionally bolted on via a separate doubling algorithm. I think this has the potential to be an area where this framework could offer significant improvements over previous works.

The way I would see it working would be:

  • Players with an available double would have a move choice of offering the double or rolling the dice.
  • If double is offered, opponent would have the move choices of accept the double or resign.
  • If the double is accepted the doubling dice would then be owned by the opposing player.

For this to work, the backgammon State will need to include:

  • Match Player 0 score
  • Match Player 1 score
  • Match SetPoints (the number of points a player needs to get to)
  • Match isCrawford (is the Crawford rule applicable?)
  • Doubling cube value
  • Doubling cube owner (0/1 for players, -1 for no owner)

Would also need to have a way of passing the following options to backgammon.cc :

  • UseDoublingCube
  • MatchSetPoints (1 for single game)
  • UseCrawfordRule

Would be interested in views on this - probably won't be looking at this for a few weeks (assuming you're happy to support this), but keen to get initial thoughts down.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepmind/open_spiel/issues/67?email_source=notifications&email_token=AHAF7TD4ZRTONSAUQU3Q6L3QK5ZWZA5CNFSM4IZCU2DKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HM4JCGA, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAF7TDPUUEK3LAZME6VCI3QK5ZWZANCNFSM4IZCU2DA .

jamesdfrost commented 5 years ago

But what about the ethical implications? I mean, a lot of these AI's are young and inexperienced, and I'd be worried about them being introduced to gambling from birth - don't you remember what happened in Superman III? ;)

elkhrt commented 5 years ago

🤣 - we've already started them on poker!

The problem with tournament play is that there's no objective measure of utility until the end of the match. So an episode (for learning purposes) needs to be the whole match rather than a single game.

Whereas a cash game has a clear utility function at the end of a single game, which is much simpler.

On Sun, 22 Sep 2019, 19:10 jamesdfrost, notifications@github.com wrote:

But what about the ethical implications? I mean, a lot of these AI's are young and inexperienced, and I'd be worried about them being introduced to gambling from birth - don't you remember what happened in Superman III? ;)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepmind/open_spiel/issues/67?email_source=notifications&email_token=AHAF7TBKAXSQCZ72CZVN5ETQK6YH7A5CNFSM4IZCU2DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7JLXAI#issuecomment-533904257, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAF7TGS4A2T6JZ7YTYDNJLQK6YH7ANCNFSM4IZCU2DA .

jamesdfrost commented 5 years ago

OK, will let them gamble, but only before 9:00pm at the weekends.! :)

Could you still not have a reward function based on the number of points won on an individual game, which is capped to the number of points which contribute to the series win, and then perhaps some form of bonus (e.g. +100) for winning the series?

So there is still an immediate reward for winning points, but also the series win becomes the most important carrot, and the bot should eventually learn not to get too risky when it doesn't need to.

I can't see a situation where you would deliberately lose a game in order to win a series, the key is making sure it doesn't risk doubles which it doesn't need to.

But agree it is going to be more difficult than money games, although there are a few extra options / rules which would need implemented for these (Beaver / Jacoby)

elkhrt commented 5 years ago

Great!

Changing the eventual total reward will result in the agent learning the wrong thing, although it's possible this isn't a big effect. I wouldn't do it.

You could possibly give intermediate rewards to help learning, and then subtract them from the final reward. That means the overall incentives are unchanged, but the game is easier to learn. This could definitely help some RL techniques.

But reward fiddling doesn't change the fundamental problem that in a match, we don't know how much better 2-0 is than 1-0 (or 0-2), whereas in the analogous cash game we do.

On Sun, 22 Sep 2019, 20:07 jamesdfrost, notifications@github.com wrote:

OK, will let them gamble, but only before 9:00pm at the weekends.! :)

Could you still not have a reward function based on the number of points won on an individual game, which is capped to the number of points which contribute to the series win, and then perhaps some form of bonus (e.g. +100) for winning the series?

So there is still an immediate reward for winning points, but also the series win becomes the most important carrot, and the bot should eventually learn not to get too risky when it doesn't need to.

I can't see a situation where you would deliberately lose a game in order to win a series, the key is making sure it doesn't risk doubles which it doesn't need to.

But agree it is going to be more difficult than money games, although there are a few extra options / rules which would need implemented for these (Beaver / Jacoby)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepmind/open_spiel/issues/67?email_source=notifications&email_token=AHAF7TA5FW6SOZECBPE7LODQK666PA5CNFSM4IZCU2DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7JM2SY#issuecomment-533908811, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAF7TBL7IPOOFTAYLYXPRDQK666PANCNFSM4IZCU2DA .

lanctot commented 5 years ago

Hi @jamesdfrost , love it. Yes, let's do this! Would be fantastic to get the full game supported. Agree we should do step-by-step. Let's get the doubling cube in there first (btw, I assume "money games" just means that the points get multiplied by the value on the die? And we can move on the matches after we know that's working.

jamesdfrost commented 5 years ago

Lanctot - yes, thats correct.

Interestingly although backgammon is often quoted as the worlds oldest board game, the introduction of doubling didn't happen till the 1920s, and led to a huge revival in the game due to the ability to really raise the stakes - before that, backgammon was in serious decline. Initially it was quite misunderstood - for example in Vanity Fair's Backgammon to Win they said"if two absolutely perfect players engaged in a match, there would never be an accepted double."

lanctot commented 4 years ago

Hi @jamesdfrost , closing this one now as it's been open for a while and I want to use the Issues for actual outstanding issues. But would still love to have doubling cubes and matches eventually!