google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.16k stars 917 forks source link

Catan Support #192

Closed JBokMan closed 4 years ago

JBokMan commented 4 years ago

Hey there my name is Julian Bokelmann and I am a computer science student at Heinrich-Heine-Universität in Duesseldorf Germany. I want to integrate The Settlers of Catan (Catan for short) into OpenSpiel as part of my bachelor thesis. I plan to use an existing implementation of Catan and first port it over into C++ and then into OpenSpiel. When I have a working game in OpenSpiel I want to test and evaluate different reinforcement learning algorithms.

lanctot commented 4 years ago

Hi Julian, that sounds like a great plan for a thesis! I would love to see those results, Catan is a great game for AI research. Please let us know if you have questions about the API.

If you want to contribute the implementation we will have to look into legalities. E.g. we might require permissions from the designers and/or publishers. I will look into this.

Either way that would not stop you from implementing it and using the learning algorithms to learn to play it!

JBokMan commented 4 years ago

Thank you for your support, if any questions come up that I can not solve myself I will get in touch.

JBokMan commented 4 years ago

I am currently trying to understand the API. As it stands I figured out that the main way my game tells its current state is through the ObservationTensor or InformationStateTensor. The ObservationTensor is set in a function of the State class through a TensorView Object. I looked at the examples tic-tac-toe and breakthrough and the TensorView class, but I do not quite understand how I set my variables there. Is it right, that I have to write my whole board state into the TensorView object? If yes how would that look like for multiple 2D arrays?

lanctot commented 4 years ago

I am currently trying to understand the API, as it stands I figured out that the main way my game tells its current state is through the ObservationTensor or InformationStateTensor. The ObservationTensor is set in a function of the State class through a TensorView Object. I looked at the examples tic-tac-toe and breakthrough and the TensorView class, but I do not quite understand how I set my variables there. Is it right, that I have to write my whole board state into the TensorView object? If yes how would that look multiple 2D arrays?

No ObservationTensor and InformationStateTensor are are mainly intended to encode the state in a format that is amenable to RL (i.e. neural network inputs). The actual state should be stored in the state class, i.e. CatanState.

For example, in Tic-Tac-Toe and Breakthrough, these variables are found here:

A good first step to see how they are used in those classes is to look at the implementation of TicTacToeState::ToString() and BreakthroughState::ToString().

You can skip InformationStateString and InformationStateTensor in Catan and implement just ObservationString and ObservationTensor. If you're unclear on how these differ from the docs, I recommend taking a look Sec 3.1 of the paper: https://arxiv.org/abs/1908.09453 . Also you don't need to implement these functions until the very end (when you're ready to try some RL techniques), although ObservationString might be useful for debugging, using the state's ToString method is a good way too.

By the way, I gave a tutorial recently which walks through some of the API, and gives an overview which may be useful to you when learning the API. The code examples are entirely on the python side, but the functions used are mapped to their C++ equivalents (just named differently, i.e. using python name conventions). You can find that here: http://mlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdf

JBokMan commented 4 years ago

Thank you for your fast response. The thing is, by now I already have a complete working implementation of Catan in C++. But it is object oriented and with around 10 classes. Do I have to completely rewrite it, so it fits into OpenSpiel? Can the State class also hold objects of other classes for the state or do they have to be primitive data types? For example I do have a class for the hex-tiles and save those in a vector. Each hex knows its position, resource and number.

elkhrt commented 4 years ago

No need to rewrite anything - just add a Game and State class which conform to the OpenSpiel API, and hold instances of your classes. See for example hanabi, which delegates all functionality to the Hanabi Learning Environment implementation. https://github.com/deepmind/open_spiel/blob/master/open_spiel/games/hanabi.h Chess, Go, Oware, Gin Rummy all use a similar pattern to a greater or lesser extent.

On Fri, 24 Apr 2020 at 16:48, Julian B. notifications@github.com wrote:

Thank you for your fast response. The thing is, by now I already have a complete working implementation of Catan in C++. But it is object oriented and with around 10 classes. Do I have to completely rewrite it, so it fits into OpenSpiel? Can the State class also hold objects of other classes for the state or do they have to be primitive data types? For example I do have a class for the hex-tiles and save those in a vector. Each hex knows its position, resource and number.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepmind/open_spiel/issues/192#issuecomment-619092218, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHAF7TFA4K7QGDIYUJXRVUDROGYEJANCNFSM4MKVKY3Q .

JBokMan commented 4 years ago

I am now able to run the learning algorithms. I implemented the difference in victory points after an action as the reward function. For the observation tensor function I return the state of my game encoded as an integer vector. Unfortunately the algorithms do not seem to learn very much. I tried the tabular_qlearner and dqn algorithms. The dqn algorithms has some hyper parameters and I tried some different configurations however the reward that I am getting is getting worse over time and hits 0 after a while and stays there. The tabular_qlearner seems like it is getting better but brings my PC to a standstill after 16000 iterations I guess I write my memory full somehow.

lanctot commented 4 years ago

Hi Julian,

This is great that you have Catan up and running already!

I am now able to run the learning algorithms. I implemented the difference in victory points after an action as the reward function.

Sounds good. Also consistent with what we did in the Hanabi Learning Environment, and seems like a generally sensible choice.

For the observation tensor function I return the state of my game encoded as an integer vector.

I don't think this is very common. I'm used to bitwise inputs, even for encoding integer values (or even float values!) We do this a lot in various places in OpenSpiel. If a value can take on e.g. {0, ..., 100} we would use 100 bits to encode its value, i.e. using a so-called "one hot encoding" (only one bit is 1 and the rest are 0). See, again the Hanabi Learning Environment as a good example (the "canonical encoding" here: https://github.com/deepmind/hanabi-learning-environment/blob/master/hanabi_learning_environment/hanabi_lib/canonical_encoders.cc)

I'm not saying it's a bad choice, I'm saying it's not common.. so I'm not sure what effect it'd have on learning (as opposed to a bitwise encoding).

Unfortunately the algorithms do not seem to learn very much. I tried the tabular_qlearner and dqn algorithms. The dqn algorithms has some hyper parameters and I tried some different configurations however the reward that I am getting is getting worse over time and hits 0 after a while and stays there.

Ok.. I'm not too surprised, you tried the vanilla DQN with a not-ideal input encoding. Also, I'm guessing small networks? Catan is large, multiplayer, and has imperfect information; this will definitely take some effort.

Here is some advice:

  1. Try a non-raw observation, i.e. features. By this I mean your observation tensor could encode some high-level information about the game. This was more popular before the deep learning era, but it could mean a lot less waiting for the network to learn something unless you have a lot of resources available to you.

  2. Larger and more complex network architectures.

  3. Some algorithmic enhancement, like e.g. AlphaZero or PSRO. If you stick with DQN I expect you'll need some of the DQN enhancements (prioritized replay, n-step Q-learning, dueling dqn, double dqn, distributional losses)... none of which we currently have implemented. (Yet!)

The tabular_qlearner seems like it is getting better but brings my PC to a standstill after 16000 iterations I guess I write my memory full somehow.

There's no hope for tabular q-learning in this game, it is simply too large. You'll run out of memory, so you'll need function approximation almost certainly.

JBokMan commented 4 years ago

I don't think this is very common. I'm used to bitwise inputs, even for encoding integer values (or even float values!) We do this a lot in various places in OpenSpiel. If a value can take on e.g. {0, ..., 100} we would use 100 bits to encode its value, i.e. using a so-called "one hot encoding" (only one bit is 1 and the rest are 0). See, again the Hanabi Learning Environment as a good example (the "canonical encoding" here: https://github.com/deepmind/hanabi-learning-environment/blob/master/hanabi_learning_environment/hanabi_lib/canonical_encoders.cc)

I looked into that, but did not quite understand it, guess I have to look harder.

Ok.. I'm not too surprised, you tried the vanilla DQN with a not-ideal input encoding. Also, I'm guessing small networks? Catan is large, multiplayer, and has imperfect information; this will definitely take some effort.

The Catan implementation has all the functionality, but I did not put all features in the functions for OpenSpiel yet, biggest missing features are development cards, the robber, trading with harbors and trading with other players. As the development cards are missing I still treat the game as perfect information. I implemented a turn limit, so that after x turns the game just ends, to fasten training speed. Furthermore I use a fixed board with fixed starting positions. I continued with DQN and found some hyper-parameters that seem to work and now I am tuning the hyper parameters to get a good result.

Thank you very much for your tips, I will definitely look into the bit wise encoding and recommended algorithms.

lanctot commented 4 years ago

I looked into that, but did not quite understand it, guess I have to look harder.

Maybe Hanabi was not the best example to point you to first. Take a look at OshiZumoState::ObservationTensor and GoofspielState::ObservationTensor in our code base for simpler examples.

Thank you very much for your tips, I will definitely look into the bit wise encoding and recommended algorithms.

It's always a good idea to start small (on simpler sub-problems) when trying to get things to work. Great to hear about the progress!

JBokMan commented 4 years ago

I implemented a one-hot encoder and the results with the DQN algorithm now seem to be much more stable. I looked at the AlphaZero and PSRO algorithms. The AlphaZero implementation seems to work only with 2 player deterministic games. Strictly speaking Catan is three to four players, but two players would not be a problem. But it needs to be deterministic too and I have chance nodes for the dice rolls. I also looked into the PSRO algorithm but did not yet dive too deep into it. The generalized PSRO seems to want all my states and writes my memory full in seconds, if there is a hyperparameter that changes this, I did not yet find it. And the version 2 algorithm seems to need the information_state_tensor which I do not have implemented yet.

elkhrt commented 4 years ago

Great! Extending the Alpha Zero implementation to handle chance nodes in one or two player games is not a major change. Worth having a go if you're interested! Obviously start testing with some smaller game like pig.

On Mon, 25 May 2020, 19:52 Julian B., notifications@github.com wrote:

I implemented a one-hot encoder and the results with the DQN algorithm now seem to be much more stable. I looked at the AlphaZero and PSRO algorithms. The AlphaZero implementation seems to work only with 2 player deterministic games. Strictly speaking Catan is three to four players, but two players would not be a problem. But it needs to be deterministic too and I have chance nodes for the dice rolls. I also looked into the PSRO algorithm but did not yet dive too deep into it. The generalized PSRO seems to want all my states and writes my memory full in seconds, if there is a hyperparameter that changes this, I did not yet find it. And the version 2 algorithm seems to need the information_state_tensor which I do not have implemented yet.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepmind/open_spiel/issues/192#issuecomment-633683930, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHAF7TA5MZ7VCOJDMKW4KYLRTK47ZANCNFSM4MKVKY3Q .

JBokMan commented 4 years ago

I will give it a try if I find the time for it. Meanwhile, I finished up the missing features, except for players trading with each other. I leave it out for now since it adds a lot of actions. I also submitted a pull request deepmind/open_spiel#233 so you can look over my implementation and make suggestions/flag errors.

lanctot commented 4 years ago

Perfect, thanks for the PR.

There was a major bug in DQN found last weekend (fixed now) that could have affected your initial results. Basically the legal moves mask was one step off, fix is here: https://github.com/deepmind/open_spiel/pull/229

I recommend pulling from master and rerun.

lanctot commented 4 years ago

Dear @Inujal , I have had some discussions with our legal team. Would you be able to meet to discuss over a this at some point soon? And, are you in CET?

JBokMan commented 4 years ago

Hey, yes that should be possible. I am currently writing exams but I am sure I will find some time in between. I live in Germany so yes I am in CET.

lanctot commented 4 years ago

No rush, I am 8 hours behind you. So most convenient time would be around 5 or 6pm CET unless you want to talk at night e.g. 8 or 9pm.

Just let me know when is mosy convenient for you.

JBokMan commented 4 years ago

I could do 6 pm CET on Wednesday and Thursday.

lanctot commented 4 years ago

Cannot do Wednesday. Will follow up by email, thanks.

lanctot commented 4 years ago

Let's keep in touch by email and voice calls on this point, thanks. Closing for now.