google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.19k stars 927 forks source link

Question: are you planning to implement the card game bridge? #92

Closed ZiggerZZ closed 4 years ago

ZiggerZZ commented 4 years ago

Thanks for implementing the uncontested bidding for bridge, I'm curious if it is in your plans to implement the full game (both the bidding and the card playing)?

lanctot commented 4 years ago

Hi @ZiggerZZ, we are not currently planning to, so it would be a welcome addition!

jamesdfrost commented 4 years ago

I think a full version of Bridge would be a really interesting test for RL as it involves working out a communication strategy between two players only using the bid mechanism itself. Would a first class RL agent come up with the same bidding techniques used by humans, or would it come up with a completely different way of communicating within the structured bidding system. Really interested to see this.

ZiggerZZ commented 4 years ago

I agree with you @jamesdfrost, and it would also be interesting to see the declaring and the defense by an RL agent.

lanctot commented 4 years ago

I agree, we'd love to have it. It's a classic AI problem. In fact, one of our co-leads @locked-deepmind is an expert bridge player and very interested. There have also been a number of papers looking at learning approaches, and even a company in France (http://www.nukk.ai/) looking into AI for bridge.

We're just quite busy with other tasks at the moment. Why not contribute an implementation @ZiggerZZ ? This is the main reason we open-sourced this code: to get the community involved and participating in this exciting research area! I would assume you could re-use a lot from the uncontested bridge bidding that currently uses a double-dummy solver to simulate the play phase. I don't know how to play it myself but from everything I've read about it, the play phase seems farily standard trick-based game, so should not be too complicated.

elkhrt commented 4 years ago

Bridge is pretty interesting! As well as the full game, there are many interesting subgames, such as the uncontested 2NT bidding game which we have in OpenSpiel. I would be delighted to have the full game implemented in OpenSpiel. Please let me know if you fancy working on this, and we can discuss approaches.

Natural bidding ('bid what you think you can make') is much more discoverable by an RL algorithm than an artificial system. The current state-of-the-art in training RL agents on bridge bidding is: https://realworld-sdm.github.io/paper/42.pdf which demonstrates this pretty well:

Example (b) - the opening bidder passes the 1H response; this implies that responder must bid something other than 1H with strong hands; this was common in the early days of bridge, but the modern style (since 1940-ish) is 'approach forcing', where 1H is unlimited and the opener must not pass.

Example (e) - responder bids 4S, guaranteeing reaching a sensible game contract but failing to bid the good slam. This suggests that more complex investigative approaches have not been learned.

ZiggerZZ commented 4 years ago

Thank you for sharing the article, very interesting. I'm indeed looking forward to trying some RL on bridge so I could work on the implementation of the full game in OpenSpiel.

lanctot commented 4 years ago

Closing this one as it has been open for a while and I want to reserve the issues for outstanding issues, but please re-open if there is any follow-up discussing on this!

ai-gamer commented 4 years ago

Hi, I noticed recent papers use DDS to evaluate the bidding phase. Do you know some API that I can call for the playing phase. I am thinking about building an agent for bidding, but I don't want to use DDS to evaluate the performance. I would like to test it with people. @locked-deepmind

elkhrt commented 4 years ago

Hey - I'm happy to add the play phase, although I would strongly advise getting results with DDS first.

lanctot commented 4 years ago

Btw for those of you interested in Bridge, just saw this: https://arxiv.org/abs/1911.07960

elkhrt commented 4 years ago

I'm planning to do this soon.

jwllee commented 4 years ago

It's very cool to see so much action on this issue!

Wanted to ask if anyone has any result with running the "standard" algorithms, i.e., CFR/MCCFR, on the bidding games? Cause it seems to me that it can take quite a long time (even with the MCCFR algorithms because the current implementation traverse the whole game tree at initialization).

Even with the uncontested 2NT bidding, there's 1 + sum (26 choose b) * 2, b = 1 to 26 = 100M states to traverse.

ai-gamer commented 4 years ago

I am a bit confused about running CFR-based algorithms on bidding. Based on my understanding, CFR family algorithms only converges to an optimal strategy (Nash Equilibrium) on 2 player game

elkhrt commented 4 years ago

If you're interested in trying out these sorts of algorithms, I recommend starting with tiny_bridge_2p (purely cooperative) and tiny_bridge_4p (two partnerships). These are small games, but still non-trivial, e.g. the 2p version has 28 possible hands and 64 non-terminal public states.

jwllee commented 4 years ago

Yea I guess I can do that.

@ai-gamer: I came across the Pluribus poker AI paper, they used MCCFR as the offline strategy and were able to achieve "super-human" results... was just wondering what kind of results one would get with bridge. Pretty new at this, so it would be great if you have any pointers on alternative algorithms other than PIMC ones.

ai-gamer commented 4 years ago

https://realworld-sdm.github.io/paper/42.pdf This is a paper training RL agent for bidding. It should be the current SOTA. Actually I am working on reproducing the results. As for the multiplayer poker paper, I think it is an extension of the 2 player agent and only shows empirical results.

elkhrt commented 4 years ago

Yes, the results in that paper are very impressive - I'm looking forward seeing your reproduction! The OpenSpiel bridge environment will be on github on Monday, which should be perfectly suited for this work.

The state representation in the paper is slightly lossy, e.g. it represents both 1NT-Dbl-2C and 1NT-Pass-Pass-Dbl-Pass-Pass-2C the same way. The OpenSpiel representation is slightly larger, but fixes this problem.

ai-gamer commented 4 years ago

Thanks for the information. @locked-deepmind I have a draft version of everything, which learns something. Maybe I can start use the open_spiel code to reproduce the results. I find that there is no A3C implemented now, maybe I can try to add one first. A stupid question, where is the implementation of A2C? I see it is in the list, but I didn't find it in the code.

lanctot commented 4 years ago

A stupid question, where is the implementation of A2C? I see it is in the list, but I didn't find it in the code.

You will find it in here: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/algorithms/policy_gradient.py and there are some examples in the policy_gradient_test.py.

lanctot commented 4 years ago

I am a bit confused about running CFR-based algorithms on bidding. Based on my understanding, CFR family algorithms only converges to an optimal strategy (Nash Equilibrium) on 2 player game

Yes this is indeed true, but just wanted to follow-up with a few quick comments:

(And I'm secretly hoping more work is done looking at CFR algorithms outside of two-player zero-sum games :))

elkhrt commented 4 years ago

Bridge is now available - let me know how you get on!

Incidentally, there is a wrapper in open_spiel/python/bots/bluechip_bridge_wrapper.py which enables any program supporting the BlueChip network protocol to be used as an OpenSpiel bot for the uncontested bridge bidding game. It would not be too difficult to extend this for the full four-player game. I've verified that this works with WBridge5, making it easy to run head-to-head comparisons with a state-of-the-art rule-based bot.

ai-gamer commented 4 years ago

I try to play with the full bridge environment. By slightly changing the breakthrough_dpn.py, I can train a bridge agent against random opponent. There are a few questions.

  1. I am still confused about how the c++ implemented games are connected with python. It happens many times that when I try to check the code of some functions using Pycharm, it goes to pyspiel.py and there is only PASS in the function.

  2. I am using tensorflow 1.14.0. But there are always some warning message like: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. when I run something. Is this normal or there is something wrong with my local environment settings.

  3. I tried to train an agent for tiny_bridge_4p, it does not work for me. Do I miss something or this game does not support rl training. Is there some smaller 4p competitive bridge version that I can check everything faster?

elkhrt commented 4 years ago
  1. Great!
  2. I'm not personally familiar with pycharm, but it seems you might need to switch to CLion. https://intellij-support.jetbrains.com/hc/en-us/community/posts/115000477130-How-to-handle-a-mixed-Python-C-C-project-
  3. This is normal; we will fix this at some point!
  4. Good point - tiny_bridge_4p doesn't have an observation tensor, so you can only use tabular methods not neural nets. I'll fix that this week and let you know.
ai-gamer commented 4 years ago

A question about the wrapper, does it mean that I can send all information of the current game state and get WBridge5's corresponding actions(both bidding and playing)? For research purpose, DDS is already good enough to evaluate the bidding system. But I am thinking about combining some new bidding system with a decent playing system to make it a platform for human players. I tried to search for some open source code for playing phase and found nothing. The wrapper's way seems a perfect solution.

elkhrt commented 4 years ago

Each player has a fixed controller - as far as I know, it wouldn't be possible to have your system bid the hand and then WBridge5 play it. But I may have missed something.

Also, don't forget to check the WBridge5 license.

ZiggerZZ commented 4 years ago

Hi!

Currently, in the play phase the Observation Tensor includes the current player's cards, dummy's cards, previous and current tricks and the numbers of tricks taken by each side. However, dummy is told what to play by the declarer, so basically there are only three agents: two defenders and the declarer who plays both for him and dummy. Therefore I propose to include into Observation Tensor not always dummy's cards, but rather visible to the current player cards, i.e. dummy's cards for the defenders and the declarer and declarer's cards for dummy.

Example:

Now it is

ObservationString(0) = "Decl:Pd T9874.63.QT74.T3 Table:T9874.63.QT74.T3 prev[] curr[RH:C9] Decl:0 Def:0"
ObservationString(1) = "Decl:LH QJ6.KJT87.KJ865. Table:T9874.63.QT74.T3 prev[] curr[Pd:C9] Decl:0 Def:0"
ObservationString(2) = "Decl:Us 52.Q92.3.KQJ8765 Table:T9874.63.QT74.T3 prev[] curr[LH:C9] Decl:0 Def:0"
ObservationString(3) = "Decl:RH AK3.A54.A92.A42 Table:T9874.63.QT74.T3 prev[] curr[Us:C9] Decl:0 Def:0"

I suggest to change it to

ObservationString(0) = "Decl:Pd T9874.63.QT74.T3 Visible:52.Q92.3.KQJ8765 prev[] curr[RH:C9] Decl:0 Def:0"
ObservationString(1) = "Decl:LH QJ6.KJT87.KJ865. Visible:T9874.63.QT74.T3 prev[] curr[Pd:C9] Decl:0 Def:0"
ObservationString(2) = "Decl:Us 52.Q92.3.KQJ8765 Visible:T9874.63.QT74.T3 prev[] curr[LH:C9] Decl:0 Def:0"
ObservationString(3) = "Decl:RH AK3.A54.A92.A42 Visible:T9874.63.QT74.T3 prev[] curr[Us:C9] Decl:0 Def:0"

What do you think? I could make a PR if you find that it is a good idea. @lanctot

lanctot commented 4 years ago

I think @locked-deepmind is the person to ask. Ed, any thoughts?

(I actually don't even know the rules to bridge! :))

elkhrt commented 4 years ago

Thanks Zigfrid - this is definitely a bug; thanks for raising it!

I've gone for a slightly different fix, not changing the observation but making sure the CurrentPlayer is never dummy. So the sequence of players in your example would be: 3, 2, 1, 2, etc. (and never 0). This matches the dynamics of the actual game if you think of CurrentPlayer as indicating an agent rather than a seat at the table.

ZiggerZZ commented 4 years ago

Thanks for the quick reply!

I agree with you that it matches the dynamics of the game, however, changing the number of agents during the game (4 in bidding and 3 in the playing phase) will make the implementation more sophisticated. Actual determination of current player current_player_ = (current_player_ + 1) % kNumPlayers; will be different in ApplyBiddingAction and ApplyPlayAction functions; we would need to change some checks like SPIEL_CHECK_TRUE(holder_[card] == current_player_); (for example, deepmind/bridge.cc#L544 and deepmind/bridge.cc#L449).

What I propose to do will only change 3 lines of code and keep the logic the same (ZiggerZZ/bridge.cc#L311):

    // If player is dummy, then store declarer's cards, otherwise store dummy's cards.
    const int dummy = contract_.declarer ^ 2;
    int visible = dummy;
    if (player == dummy) visible = contract_.declarer;

I don't have much experience in designing libraries so please tell me if I'm wrong and it's better to implement the actual dynamics of the game straightforwardly rather then looking for simplifications :)

elkhrt commented 4 years ago

We can track the next hand to play in currentplayer (as now) and translate it to the agent / player in the CurrentPlayer() call. It's also only a couple of lines.

One advantage of this is that the declarer agent can plan ahead if it plays both hands. There's no special logic needed to handle there only being three active players - we just never call the other player to ask for an action.