Implementing NSFP for custom simultaneous game with non homogenous agents.

google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

Apache License 2.0

4.23k stars 932 forks source link

Implementing NSFP for custom simultaneous game with non homogenous agents. #816

Closed hhlei closed 2 years ago

hhlei commented 2 years ago

I'm a little confused on how to implement NSFP for simultaneous games. In the examples, NSFPpolicies's actionProbabilities method relies on state.currentPlayer(). But in simultaneous games (in my game at least), the currplayer is always kSimultaneousID. How exactly does the right obs/rewards get routed to the right NSFPpolicy? Do I have to define special behavior in the game to make sure that things happen in the right order?

lanctot commented 2 years ago

Best way to do this is to used the turn-based simultaneous game wrapper. There should be some examples (maybe in nfsp_test or game_transforms). If not I am quite sure cfr_test would have some exampled of how to use it.

The observation and info state functions have players as arguments (the observing player). You can check goofspiel, oshi-zumo, or Markov soccer for examples. These let you observe the state as any player even if the current player is the simultaneous player id.

Edit: our NFSP implementations assume turn-based games hence the wrapper above, but should be possible to adapt a tailored NFSP to simultaneous games if you prefer.

Edit2: generalizing the implementations to support simultaneous move games would be even better. There are a few steps involved, might make a nice contribution, though!

hhlei commented 2 years ago

Thanks for the help! Quick question: in Goofspiel, the game class has member variables defaultobserver, info_stateobserver etc, that are labeled as "old observation api". What is the newer way to do it?

lanctot commented 2 years ago

Hi @hhlei,

Do you mean this comment? I think that comment is referring just to the first one "default_observer". That observer is used to implement the "old" observation API.

OpenSpiel supports a "new" (it's almost already 2 years old! haha) observation API. Examples of this API can be seen in Leduc and Goofspiel. It uses the observer objects to more cleanly fill up the observation and information state tensors.

Not all games use the observation because many games have not been converted to use it. And also new games still use the "old" format of just filling the tensor manually (see e.g. "bargaining" which I added recently). There is no requirement to use the observation API, so I'm not sure referring to them as "old" versus "new" makes much sense at this point. I'd probably use the terminology: uses the observation API (e.g. Leduc, Goofspiel) versus does not use the observation API (e.g. bargaining).

lanctot commented 2 years ago

Hi @hhlei. Any update on this? If not I am inclined to close it.