Closed ZiggerZZ closed 4 years ago
Hi @ZiggerZZ, we are not currently planning to, so it would be a welcome addition!
I think a full version of Bridge would be a really interesting test for RL as it involves working out a communication strategy between two players only using the bid mechanism itself. Would a first class RL agent come up with the same bidding techniques used by humans, or would it come up with a completely different way of communicating within the structured bidding system. Really interested to see this.
I agree with you @jamesdfrost, and it would also be interesting to see the declaring and the defense by an RL agent.
I agree, we'd love to have it. It's a classic AI problem. In fact, one of our co-leads @locked-deepmind is an expert bridge player and very interested. There have also been a number of papers looking at learning approaches, and even a company in France (http://www.nukk.ai/) looking into AI for bridge.
We're just quite busy with other tasks at the moment. Why not contribute an implementation @ZiggerZZ ? This is the main reason we open-sourced this code: to get the community involved and participating in this exciting research area! I would assume you could re-use a lot from the uncontested bridge bidding that currently uses a double-dummy solver to simulate the play phase. I don't know how to play it myself but from everything I've read about it, the play phase seems farily standard trick-based game, so should not be too complicated.
Bridge is pretty interesting! As well as the full game, there are many interesting subgames, such as the uncontested 2NT bidding game which we have in OpenSpiel. I would be delighted to have the full game implemented in OpenSpiel. Please let me know if you fancy working on this, and we can discuss approaches.
Natural bidding ('bid what you think you can make') is much more discoverable by an RL algorithm than an artificial system. The current state-of-the-art in training RL agents on bridge bidding is: https://realworld-sdm.github.io/paper/42.pdf which demonstrates this pretty well:
Example (b) - the opening bidder passes the 1H response; this implies that responder must bid something other than 1H with strong hands; this was common in the early days of bridge, but the modern style (since 1940-ish) is 'approach forcing', where 1H is unlimited and the opener must not pass.
Example (e) - responder bids 4S, guaranteeing reaching a sensible game contract but failing to bid the good slam. This suggests that more complex investigative approaches have not been learned.
Thank you for sharing the article, very interesting. I'm indeed looking forward to trying some RL on bridge so I could work on the implementation of the full game in OpenSpiel.
Closing this one as it has been open for a while and I want to reserve the issues for outstanding issues, but please re-open if there is any follow-up discussing on this!
Hi, I noticed recent papers use DDS to evaluate the bidding phase. Do you know some API that I can call for the playing phase. I am thinking about building an agent for bidding, but I don't want to use DDS to evaluate the performance. I would like to test it with people. @locked-deepmind
Hey - I'm happy to add the play phase, although I would strongly advise getting results with DDS first.
Btw for those of you interested in Bridge, just saw this: https://arxiv.org/abs/1911.07960
I'm planning to do this soon.
It's very cool to see so much action on this issue!
Wanted to ask if anyone has any result with running the "standard" algorithms, i.e., CFR/MCCFR, on the bidding games? Cause it seems to me that it can take quite a long time (even with the MCCFR algorithms because the current implementation traverse the whole game tree at initialization).
Even with the uncontested 2NT bidding, there's 1 + sum (26 choose b) * 2, b = 1 to 26 = 100M states to traverse.
I am a bit confused about running CFR-based algorithms on bidding. Based on my understanding, CFR family algorithms only converges to an optimal strategy (Nash Equilibrium) on 2 player game
If you're interested in trying out these sorts of algorithms, I recommend starting with tiny_bridge_2p (purely cooperative) and tiny_bridge_4p (two partnerships). These are small games, but still non-trivial, e.g. the 2p version has 28 possible hands and 64 non-terminal public states.
Yea I guess I can do that.
@ai-gamer: I came across the Pluribus poker AI paper, they used MCCFR as the offline strategy and were able to achieve "super-human" results... was just wondering what kind of results one would get with bridge. Pretty new at this, so it would be great if you have any pointers on alternative algorithms other than PIMC ones.
https://realworld-sdm.github.io/paper/42.pdf This is a paper training RL agent for bidding. It should be the current SOTA. Actually I am working on reproducing the results. As for the multiplayer poker paper, I think it is an extension of the 2 player agent and only shows empirical results.
Yes, the results in that paper are very impressive - I'm looking forward seeing your reproduction! The OpenSpiel bridge environment will be on github on Monday, which should be perfectly suited for this work.
The state representation in the paper is slightly lossy, e.g. it represents both 1NT-Dbl-2C and 1NT-Pass-Pass-Dbl-Pass-Pass-2C the same way. The OpenSpiel representation is slightly larger, but fixes this problem.
Thanks for the information. @locked-deepmind I have a draft version of everything, which learns something. Maybe I can start use the open_spiel code to reproduce the results. I find that there is no A3C implemented now, maybe I can try to add one first. A stupid question, where is the implementation of A2C? I see it is in the list, but I didn't find it in the code.
A stupid question, where is the implementation of A2C? I see it is in the list, but I didn't find it in the code.
You will find it in here: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/algorithms/policy_gradient.py and there are some examples in the policy_gradient_test.py
.
I am a bit confused about running CFR-based algorithms on bidding. Based on my understanding, CFR family algorithms only converges to an optimal strategy (Nash Equilibrium) on 2 player game
Yes this is indeed true, but just wanted to follow-up with a few quick comments:
(And I'm secretly hoping more work is done looking at CFR algorithms outside of two-player zero-sum games :))
Bridge is now available - let me know how you get on!
Incidentally, there is a wrapper in open_spiel/python/bots/bluechip_bridge_wrapper.py which enables any program supporting the BlueChip network protocol to be used as an OpenSpiel bot for the uncontested bridge bidding game. It would not be too difficult to extend this for the full four-player game. I've verified that this works with WBridge5, making it easy to run head-to-head comparisons with a state-of-the-art rule-based bot.
I try to play with the full bridge environment. By slightly changing the breakthrough_dpn.py, I can train a bridge agent against random opponent. There are a few questions.
I am still confused about how the c++ implemented games are connected with python. It happens many times that when I try to check the code of some functions using Pycharm, it goes to pyspiel.py and there is only PASS in the function.
I am using tensorflow 1.14.0. But there are always some warning message like: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. when I run something. Is this normal or there is something wrong with my local environment settings.
I tried to train an agent for tiny_bridge_4p, it does not work for me. Do I miss something or this game does not support rl training. Is there some smaller 4p competitive bridge version that I can check everything faster?
A question about the wrapper, does it mean that I can send all information of the current game state and get WBridge5's corresponding actions(both bidding and playing)? For research purpose, DDS is already good enough to evaluate the bidding system. But I am thinking about combining some new bidding system with a decent playing system to make it a platform for human players. I tried to search for some open source code for playing phase and found nothing. The wrapper's way seems a perfect solution.
Each player has a fixed controller - as far as I know, it wouldn't be possible to have your system bid the hand and then WBridge5 play it. But I may have missed something.
Also, don't forget to check the WBridge5 license.
Hi!
Currently, in the play phase the Observation Tensor includes the current player's cards, dummy's cards, previous and current tricks and the numbers of tricks taken by each side. However, dummy is told what to play by the declarer, so basically there are only three agents: two defenders and the declarer who plays both for him and dummy. Therefore I propose to include into Observation Tensor not always dummy's cards, but rather visible to the current player cards, i.e. dummy's cards for the defenders and the declarer and declarer's cards for dummy.
Example:
Now it is
ObservationString(0) = "Decl:Pd T9874.63.QT74.T3 Table:T9874.63.QT74.T3 prev[] curr[RH:C9] Decl:0 Def:0"
ObservationString(1) = "Decl:LH QJ6.KJT87.KJ865. Table:T9874.63.QT74.T3 prev[] curr[Pd:C9] Decl:0 Def:0"
ObservationString(2) = "Decl:Us 52.Q92.3.KQJ8765 Table:T9874.63.QT74.T3 prev[] curr[LH:C9] Decl:0 Def:0"
ObservationString(3) = "Decl:RH AK3.A54.A92.A42 Table:T9874.63.QT74.T3 prev[] curr[Us:C9] Decl:0 Def:0"
I suggest to change it to
ObservationString(0) = "Decl:Pd T9874.63.QT74.T3 Visible:52.Q92.3.KQJ8765 prev[] curr[RH:C9] Decl:0 Def:0"
ObservationString(1) = "Decl:LH QJ6.KJT87.KJ865. Visible:T9874.63.QT74.T3 prev[] curr[Pd:C9] Decl:0 Def:0"
ObservationString(2) = "Decl:Us 52.Q92.3.KQJ8765 Visible:T9874.63.QT74.T3 prev[] curr[LH:C9] Decl:0 Def:0"
ObservationString(3) = "Decl:RH AK3.A54.A92.A42 Visible:T9874.63.QT74.T3 prev[] curr[Us:C9] Decl:0 Def:0"
What do you think? I could make a PR if you find that it is a good idea. @lanctot
I think @locked-deepmind is the person to ask. Ed, any thoughts?
(I actually don't even know the rules to bridge! :))
Thanks Zigfrid - this is definitely a bug; thanks for raising it!
I've gone for a slightly different fix, not changing the observation but making sure the CurrentPlayer is never dummy. So the sequence of players in your example would be: 3, 2, 1, 2, etc. (and never 0). This matches the dynamics of the actual game if you think of CurrentPlayer as indicating an agent rather than a seat at the table.
Thanks for the quick reply!
I agree with you that it matches the dynamics of the game, however, changing the number of agents during the game (4 in bidding and 3 in the playing phase) will make the implementation more sophisticated. Actual determination of current player current_player_ = (current_player_ + 1) % kNumPlayers;
will be different in ApplyBiddingAction
and ApplyPlayAction
functions; we would need to change some checks like SPIEL_CHECK_TRUE(holder_[card] == current_player_);
(for example, deepmind/bridge.cc#L544 and deepmind/bridge.cc#L449).
What I propose to do will only change 3 lines of code and keep the logic the same (ZiggerZZ/bridge.cc#L311):
// If player is dummy, then store declarer's cards, otherwise store dummy's cards.
const int dummy = contract_.declarer ^ 2;
int visible = dummy;
if (player == dummy) visible = contract_.declarer;
I don't have much experience in designing libraries so please tell me if I'm wrong and it's better to implement the actual dynamics of the game straightforwardly rather then looking for simplifications :)
We can track the next hand to play in currentplayer (as now) and translate it to the agent / player in the CurrentPlayer() call. It's also only a couple of lines.
One advantage of this is that the declarer agent can plan ahead if it plays both hands. There's no special logic needed to handle there only being three active players - we just never call the other player to ask for an action.
Thanks for implementing the uncontested bidding for bridge, I'm curious if it is in your plans to implement the full game (both the bidding and the card playing)?