google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.26k stars 934 forks source link

Question about action encoding, chance nodes #147

Closed inejc closed 4 years ago

inejc commented 4 years ago

Thanks for open-sourcing this project, great initiative! I'm a bit confused about how to encode the actions when implementing a rather distinctive sequential game that consists of multiple phases. Each phase is made up of multiple neighboring states within the game tree and has its own set of distinct actions. We can think of them as different phases because actions in one phase can never be legal in any other phase.

For example, the first phase could be multiple players choosing a single card out of maximum N possible cards, each turn, until none of the cards are left to choose from. The second phase could be the same players playing the selected cards, discarding them in turns, until all players have empty hands. The first phase thus has N distinct actions and the second phase has a different set of N distinct actions.

Do actions returned by State::LegalActions() need to be encoded uniquely across all different phases, e.g. [0, N-1] in phase 1 and [N, 2N-1] in phase two, or can I use codes [0, N-1] for both phases? Do actions need to be encoded uniquely across arbitrary states at all or is this only a requirement for states within the same information set?

elkhrt commented 4 years ago

Sounds intriguing! It's fine to use 0..n-1 for both phases. There is no consistency requirement between different states, but a well-chosen scheme may help neural networks learn.

On Mon, 17 Feb 2020, 17:38 Nejc Ilenic, notifications@github.com wrote:

Thanks for open-sourcing this project, great initiative! I'm a bit confused about how to encode the actions when implementing a rather distinctive sequential game that consists of multiple phases. Each phase is made up of multiple neighboring states within the game tree and has its own set of distinct actions. We can think of them as different phases because actions in one phase can never be legal in any other phase.

For example, the first phase could be multiple players choosing a single card out of maximum N possible cards, each turn, until none of the cards are left to choose from. The second phase could be the same players playing the selected cards, discarding them in turns, until all players have empty hands. The first phase thus has N distinct actions and the second phase has a different set of N distinct actions.

Do actions returned by State::LegalActions() need to be encoded uniquely across all different phases, e.g. [0, N-1] in phase 1 and [N, 2N-1] in phase two, or can I use codes [0, N-1] for both phases? Do actions need to be encoded uniquely across arbitrary states at all or is this only a requirement for states within the same information set?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepmind/open_spiel/issues/147?email_source=notifications&email_token=AHAF7TAUQUVFTYMWQHJ2K3DRDK4SBA5CNFSM4KWVBA22YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IOCYPJQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHAF7TFOHQNMUAAA2DZSYJ3RDK4SBANCNFSM4KWVBA2Q .

inejc commented 4 years ago

Thanks for the quick reply, I have another, unrelated question regarding support for

Stochasticity (via explicit chance nodes mostly, even though implicit stochasticity is partially supported)

from intro.md.

What exactly does partial support mean, i.e. are there any advantages when implementing chance nodes as explicit-stochastic rather than sampled-stochastic?

elkhrt commented 4 years ago

Many of our algorithms assume that taking the same action from the same state results in the same next state.

If you're writing your own algorithms, feel free to use whichever works best for you. But if you want to use the existing implementations of e.g. exploitability, you'll need to have explicit stochastic nodes.

On Tue, 18 Feb 2020, 08:29 Nejc Ilenic, notifications@github.com wrote:

Thanks for the quick reply, I have another, unrelated question regarding support for

Stochasticity (via explicit chance nodes mostly, even though implicit stochasticity is partially supported)

from intro.md https://github.com/deepmind/open_spiel/blob/master/docs/intro.md.

What exactly does partial support mean, i.e. are there any advantages when implementing chance nodes as explicit-stochastic rather than sampled-stochastic?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepmind/open_spiel/issues/147?email_source=notifications&email_token=AHAF7TB4N7DLXPC7N75RF3DRDOE5BA5CNFSM4KWVBA22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMA4SRA#issuecomment-587319620, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHAF7THTLVSDLVUGWV5HMSDRDOE5BANCNFSM4KWVBA2Q .

inejc commented 4 years ago

That makes sense, thank you very much.

lanctot commented 4 years ago

Just to add to what @elkhrt said: if you plan to use RL/search algorithms in particular (or any algorithm where only a sample outcome at chance node is needed-- i.e. rather than the full distribution), then kSampledStochastic should not cause any problems.

I noticed that you're linking OpenSpiel from your own repos about a card game called Tarok. That's really neat! I'd love to know more. My first guess is that any algorithm you plan to run on a game that large will not require explicit traversals of the distributions, but correct me if I'm wrong. Also you can see some example games where we used kSampledStochastic in the code base.

But also I know of some research work and groups who've worked or working on Bridge and Skat that may interest you (and @elkhrt recently added a supervised learning example for learning to predict bidding by WBridge5).

Anyway please let us know if you have any more questions!

inejc commented 4 years ago

We're building a Tarok agent as a hobby project. The game is somewhat more complex than Bridge; see this paper (subsection 2.2) for the complexity analysis of the three-player version which we want to tackle first. If you are interested in the detailed description of the rules, they are available here.

We're in the process of finding and reading the relevant papers but a rough idea is to use some variant of CFR to compute a blueprint strategy on an abstracted game and then improve this strategy with real-time search (the main resource being this multiplayer poker agent). From what I understand at this point, I don't think we need an explicit distribution over the actions at chance nodes.

Additional resources are more than welcome, I wasn't even familiar with Skat, thanks! And solving the bidding phase in a supervised manner seems interesting too (although we'd have to see whether there are any appropriate datasets available).

Thank you for your help. I have another question, but I'll open a separate issue so that it's searchable more easily.