QueensGambit / CrazyAra

A Deep Learning UCI-Chess Variant Engine written in C++ & Python :parrot:
https://lichess.org/@/CrazyAra
GNU General Public License v3.0
241 stars 42 forks source link

Training the engine to play custom variants #29

Open bkitej opened 4 years ago

bkitej commented 4 years ago

Is there a process for training this engine to play custom variants not included in the Python-chess module, or in Stockfish's supported variants? I'm looking to write a variant where the only rule modification is one can capture one's own pieces, which doesn't exist in either yet. Assuming that Board is programmed correctly, could I train an engine around it?

QueensGambit commented 4 years ago

Hi @bkitej, does your variant of interest have an official name? If I understand your question correctly then you are interested in a variant which follows the same rules as classical chess with the extension to be also able to capture their own pieces except the king.

As you stated correctly, the python MCTS version of CrazyAra uses python-chess for move generation and the C++ MCTS version is based on Multi-Variant-Stockfish. Python-chess is also used to prepare the training data in the case of supervised learning.

I don't know your variant, but I assume that the games will mostly play out as regular chess games and only in a few, special cases it will be superior to capture their own piece.

The neural network input representation and output representation would be the same as for regular chess.

You could register your own variant in python chess or in Stockfish accordingly and modify the respective move-generation:

I assume there aren't any games available for your variant. Therefore supervised learning isn't possible. Alternatively you could use a network trained on classical chess games and later adjust the policy distribution to ensure the exploration of capturing their own pieces:

Of course you could theoretically learn your variant from scratch or continue learning based on a given supervised network via selfplay. This however is rather computational intensive and would require an implementation in C++.

There is also Fairy-Stockfish which supports a few more shogi-variants at the cost of slightly lower performance. In principle this could also be used as a move-generation back-end instead of Multi-Variant-Stockfish for the MCTS C++ version.

I suggest that you first create your variant in python-chess and if this works correctly, you can either implement it in C++ and extend Mutli-Variant-Stockfish or add it to one of CrazyAra's MCTS versions.

The current neural network weights in the official releases only support crazyhouse so far. Lately, I added a network input and output representation which supports all chess variants on lichess.org. In 2020, I might add a lighter representation for classical chess and chess960 only and publish some network weights.

bkitej commented 4 years ago

Thank you for your very quick response! For a little background, I'm an undergraduate student in statistics, and this is for a project-based class. I have a few classes in ML under my belt, almost exclusively in Python, my language of choice.

does your variant of interest have an official name?

I've seen it once referred to as "friendly fire chess" online, but otherwise it doesn't seem to be an established variant at all.

If I understand your question correctly then you are interested in a variant which follows the same rules as classical chess with the extension to be also able to capture their own pieces except the king.

Correct!

I don't know your variant, but I assume that the games will mostly play out as regular chess games and only in a few, special cases it will be superior to capture their own piece.

The focus of my project is to discover these emergent behaviors. My own hypothesis is that games will become more materialistic, as checkmates can be escaped in more situations, at the cost of a weaker late game. How often it diverges from classical chess is another point of interest.

You could register your own variant in python chess or in Stockfish accordingly and modify the respective move-generation

This is likely the best direction to take. I've noticed there are a few Stockfish projects on github, is there a pure Python implementation you would recommend?

Of course you could theoretically learn your variant from scratch or continue learning based on a given supervised network via selfplay. This however is rather computational intensive and would require an implementation in C++.

I'm an undergraduate senior with virtually no knowledge of C++, but I do have access to my school's high-performance computing cluster... Does that brute-force approach sound like a viable alternative to writing it in C? I don't mind if the move generation itself takes up to a minute or so.

I suggest that you first create your variant in python-chess and if this works correctly, you can either implement it in C++ and extend Mutli-Variant-Stockfish or add it to one of CrazyAra's MCTS versions.

Can you tell me more about CrazyAra's MCTS versions? As in, if I'm able to successfully program my variant into python-chess, is that a relatively straightforward process? Naturally, I'd like to avoid C++ as much as possible...

QueensGambit commented 4 years ago

Naturally, I'd like to avoid C++ as much as possible...

Well understandable. An implementation in C++ will very likely exceed the scope of your project.

Starting a project with the idea to run reinforcement learning is often the first idea, but is usually neglected the more you learn and understand how much computational hardware is needed.

Python only implementations might only be practicable for simple games with small board sizes and a single piece type: e.g. Othello, Gobang, TicTacToe, Connect4, Reversie

The most simple python only MCTS implementation for classical chess is probably

Just to put things into perspective, here a few statistics about how the number of MCTS simulations per second (NPS) developed in the case of CrazyAra. The NPS is a measurement of how many games you would be able to generate when setting a fixed number of nodes per move (e.g. 800): Our first MCTS python version CrazyAra 0.1.0 (which isn't available) was mainly based on https://github.com/Zeta36/chess-alpha-zero. Here, CrazyAra achieved 25-35 NPS on a GTX1080ti using an AlphaZero-like network with 7 blocks.

Chess-alpha-zero also uses python-chess for move-generation. However, the main purpose of python-chess is not to be used in chess-engines. The default 3-fold-repetition check is an example for this: https://github.com/niklasf/python-chess/issues/338 The mcts of chess-alpha-zero also uses the simpler Keras framework at the cost of performance as well as plain python list iterations for implementing the MCTS formulas.

Both of these are not ideal and thus we implemented the MCTS python version from scratch in vectorized form using numpy and a more direct access to the neural network with MXNet/Gluon. Over the course of the development additional techniques were added, like a transposition table, which allows to double the NPS in some cases and a time management system.

For the most recent python MCTS version (0.5.1), the speed is about 250-300 NPS using the RISEv1 architecture, 7 residual blocks + 12 bottleneck residual blocks with SE.

The same network achieved 1200-1500 NPS in the first MCTS C++ release (0.6.0) and 3500-4000 NPS when using the more efficient RISEv2 mobile architecture.

In the current C++ development version, CrazyAra achieves 8000-10k NPS which generates about 20 crazyhouse games per minute on a GTX1080ti. For this CrazyAra uses larger a batch-size and a TensorRT back-end.

20 games per minute might seem much, but you should still have access to multiple GPUs in this case and preferably start learning from a given supervised network first. Finding the right RL hyper-parameters is non-trivial and often requires a lot of fine-tuning.

[...] but I do have access to my school's high-performance computing cluster...

Nice to hear that, but if your computing-cluster only includes CPUs it might be not of much use for training and running NN inference on large scale. Moreover, if you are planning to use python only you might be able to utilize the GPUs by 1-10% which is rather inefficient.

Reinforcement learning methods are only supported in the C++ MCTS CrazyAra version. Adapting CrazyAra to play "friendly fire chess" will be much easier when supervised networks for classical chess are available. Therefore, I suggest that you try out chess-alpha-zero instead because it supports classical chess and you can train your network first using supervised learning:

For an exemplary AlphaBeta search you can take a look at Sunfish:

bkitej commented 4 years ago

Thanks again for the quick response!

However, the main purpose of python-chess is not to be used in chess-engines.

Oh okay, good to know. Generating the ruleset and gamestate is my main (and... first) bottleneck. It looks like Sunfish has its own board model. Maybe stealing parts of this this will be easier than adapting python-chess's opaque bitboard implementation.

Re: my school's computing cluster, it's my understanding that it's a GPU cluster engineered specifically for the age of neural nets. Even inefficient GPU usage will outperform my humble MacBook Pro... plus, the demands of this first implementation aren't too strenuous.

I'll try out alpha-zero at your recommendation. I don't mind having this issue closed out in the meantime. Thank you!

bkitej commented 4 years ago

Again, sincere thanks for your help.

Would you mind if I continued to ask further questions over email, as I discover them in the course of this project? I worry that having these conversations on the public GitHub repo might be irrelevant to the project as a whole, and thus rather impolite. I'm a beginner when it comes to the GitHub community and its etiquette, and I'm wary about being unnecessarily imposing...

On Tue, Nov 19, 2019 at 8:06 AM Johannes Czech notifications@github.com wrote:

Naturally, I'd like to avoid C++ as much as possible...

Well understandable. An implementation in C++ will very likely exceed the scope of your project.

Starting a project with the idea to run reinforcement learning is often the first idea, but is usually neglected the more you learn and understand how much computational hardware is needed.

Python only implementations might only be practicable for simple games with small board sizes and a single piece type: e.g. Othello, Gobang, TicTacToe, Connect4, Reversie

The most simple python only MCTS implementation for classical chess is probably

Just to put things into perspective, here a few statistics about how the number of MCTS simulations per second (NPS) developed in the case of CrazyAra. The NPS is a measurement of how many games you would be able to generate when setting a fixed number of nodes per move (e.g. 800): Our first MCTS python version CrazyAra 0.1.0 (which isn't available) was mainly based on https://github.com/Zeta36/chess-alpha-zero. Here, CrazyAra achieved 25-35 NPS on my GTX1080ti using an AlphaZero-like network with 7 blocks.

Chess-alpha-zero also uses python-chess for move-generation. However, the main purpose of python-chess is not to be used in chess-engines. The default 3-fold-repetition check is an example for this: niklasf/python-chess#338 https://github.com/niklasf/python-chess/issues/338 The mcts of chess-alpha-zero also uses the simpler Keras framework at the cost of performance as well as plain python list iterations for implementing the MCTS formulas.

Both of these are not ideal and thus we implemented the MCTS python version from scratch in vectorized form using numpy and a more direct access to the neural network with MXNet/Gluon. Over the course of the development additional techniques were added, like a transposition table, which allows to double the NPS in some cases and a time management system.

For the most recent python MCTS version (0.5.1), the speed is about 250-300 NPS using the RISEv1 architecture, 7 residual blocks + 12 bottleneck residual blocks with SE https://github.com/QueensGambit/CrazyAra/wiki/Model-architecture.

The same network achieved 1200-1500 NPS in the first MCTS C++ release (0.6.0) and 3500-4000 NPS when using the more efficient RISEv2 mobile architecture.

In the current C++ version, CrazyAra achieves 8000-10k NPS which generates about 20 crazyhouse games per minute on my GTX1080ti. For this CrazyAra uses larger batch-size and a TensorRT back-end.

20 games per minute might seem much, but you should still have access to multiple GPUs in this case and preferably start learning from a given supervised network first. Finding the right RL hyper-parameters is non-trivial and often requires a lot of fine-tuning.

[...] but I do have access to my school's high-performance computing cluster...

Nice to hear that, but if your computing-cluster only includes CPUs it might be not of much use for training and running NN inference on large scale. Moreover, if you are planning to use python only you might be able to utilize the GPUs by 1-10% which is rather inefficient.

Reinforcement learning methods are only supported in the C++ MCTS CrazyAra version. Adapting CrazyAra to play "friendly fire chess" will be much easier when supervised networks for classical chess are available. Therefore, I suggest that you try out chess-alpha-zero instead because it supports classical chess and you can train your network first using supervised learning:

For an exemplary AlphaBeta search you can take a look at Sunfish:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/QueensGambit/CrazyAra/issues/29?email_source=notifications&email_token=AHPSKYMOEIPTNW63IC4SQNDQUQFJBA5CNFSM4JM5Q4I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOXBFA#issuecomment-555577492, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHPSKYJWT7Y4TDGFB4YX4I3QUQFJBANCNFSM4JM5Q4IQ .

DeFilippis commented 4 years ago

I'm actually enjoying reading this conversation, and it may be of help to me, as I'm interested in analogous projects. I vote for keeping the generically useful questions here.

On Tue, Nov 19, 2019 at 3:14 PM bkitej notifications@github.com wrote:

Again, sincere thanks for your help.

Would you mind if I continued to ask further questions over email, as I discover them in the course of this project? I worry that having these conversations on the public GitHub repo might be irrelevant to the project as a whole, and thus rather impolite. I'm a beginner when it comes to the GitHub community and its etiquette, and I'm wary about being unnecessarily imposing...

On Tue, Nov 19, 2019 at 8:06 AM Johannes Czech notifications@github.com wrote:

Naturally, I'd like to avoid C++ as much as possible...

Well understandable. An implementation in C++ will very likely exceed the scope of your project.

Starting a project with the idea to run reinforcement learning is often the first idea, but is usually neglected the more you learn and understand how much computational hardware is needed.

Python only implementations might only be practicable for simple games with small board sizes and a single piece type: e.g. Othello, Gobang, TicTacToe, Connect4, Reversie

The most simple python only MCTS implementation for classical chess is probably

Just to put things into perspective, here a few statistics about how the number of MCTS simulations per second (NPS) developed in the case of CrazyAra. The NPS is a measurement of how many games you would be able to generate when setting a fixed number of nodes per move (e.g. 800): Our first MCTS python version CrazyAra 0.1.0 (which isn't available) was mainly based on https://github.com/Zeta36/chess-alpha-zero. Here, CrazyAra achieved 25-35 NPS on my GTX1080ti using an AlphaZero-like network with 7 blocks.

Chess-alpha-zero also uses python-chess for move-generation. However, the main purpose of python-chess is not to be used in chess-engines. The default 3-fold-repetition check is an example for this: niklasf/python-chess#338 https://github.com/niklasf/python-chess/issues/338 The mcts of chess-alpha-zero also uses the simpler Keras framework at the cost of performance as well as plain python list iterations for implementing the MCTS formulas.

Both of these are not ideal and thus we implemented the MCTS python version from scratch in vectorized form using numpy and a more direct access to the neural network with MXNet/Gluon. Over the course of the development additional techniques were added, like a transposition table, which allows to double the NPS in some cases and a time management system.

For the most recent python MCTS version (0.5.1), the speed is about 250-300 NPS using the RISEv1 architecture, 7 residual blocks + 12 bottleneck residual blocks with SE https://github.com/QueensGambit/CrazyAra/wiki/Model-architecture.

The same network achieved 1200-1500 NPS in the first MCTS C++ release (0.6.0) and 3500-4000 NPS when using the more efficient RISEv2 mobile architecture.

In the current C++ version, CrazyAra achieves 8000-10k NPS which generates about 20 crazyhouse games per minute on my GTX1080ti. For this CrazyAra uses larger batch-size and a TensorRT back-end.

20 games per minute might seem much, but you should still have access to multiple GPUs in this case and preferably start learning from a given supervised network first. Finding the right RL hyper-parameters is non-trivial and often requires a lot of fine-tuning.

[...] but I do have access to my school's high-performance computing cluster...

Nice to hear that, but if your computing-cluster only includes CPUs it might be not of much use for training and running NN inference on large scale. Moreover, if you are planning to use python only you might be able to utilize the GPUs by 1-10% which is rather inefficient.

Reinforcement learning methods are only supported in the C++ MCTS CrazyAra version. Adapting CrazyAra to play "friendly fire chess" will be much easier when supervised networks for classical chess are available. Therefore, I suggest that you try out chess-alpha-zero instead because it supports classical chess and you can train your network first using supervised learning:

For an exemplary AlphaBeta search you can take a look at Sunfish:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/QueensGambit/CrazyAra/issues/29?email_source=notifications&email_token=AHPSKYMOEIPTNW63IC4SQNDQUQFJBA5CNFSM4JM5Q4I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOXBFA#issuecomment-555577492 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AHPSKYJWT7Y4TDGFB4YX4I3QUQFJBANCNFSM4JM5Q4IQ

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/QueensGambit/CrazyAra/issues/29?email_source=notifications&email_token=AACLNQ5LC3E4AZHAZQW5IC3QURCJVA5CNFSM4JM5Q4I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEPS63I#issuecomment-555691885, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACLNQ3PFQGVDPCU7UNK6ZLQURCJVANCNFSM4JM5Q4IQ .

QueensGambit commented 4 years ago

Answering these questions via GitHub is fine by me. It seems to be interesting for others as well (e.g. @DeFilippis ) and this way I don't have to repeat myself.

PyChess is a great library when writing an engine for educational purposes because it comes along with many valuable tools and visualisation techniques. It is also quite optimized within the python framework. The only major slowdown was the 3-fold-repetition check which took about 1/3 of the total runtime at one point. If you like you can also integrate CrazyAra's can_claim_threefold_repetition() into chess-alpha-zero.

A fellow student also added the Bughouse variant into PyChess which might help you registering your own variant:

bkitej commented 4 years ago

I'd like to revisit a couple points that have been mentioned so far.

  1. Using AlphaZero for supervised learning
  1. Using a network trained on classical chess games and later adjusting the policy distribution
  1. You mentioned you used python-chess to prepare the training data, but then programmed and trained the engine with C++... can you tell me more about that process?

A further note on the scope of this variant:

FFChess has a strictly larger range of legal moves at almost all stages of the game. For example, in classical chess, there are 20 possible first moves; in FFChess there are 40 by my count.

Self-play is still the ideal though. Under the constraints of my current technical ability and time, I'm considering a shift to studying the effects of FFChess on famous endgame positions only. Any thoughts on how that might affect the feasibility of this Python-only project?

QueensGambit commented 4 years ago

How would training a network using classical chess training data, give me anything other than a classical chess engine? Would this be a matter of adjusting the existing weights?

Of course you're right that training on classical games will result in a network which only plays classical chess. However, since FFChess is very similar to classical chess you could use this network as a starting for either training on FFChess data or starting selfplay. This concept is called transfer learning and will very likely reduce computational cost.

what differences might I expect to exist between a network trained on a classical engine, and one trained with reinforcement learning Training a neural network by supervised learning and reinforcement has severall different concepts and challenges.

In supervised learning you can only learn from sparse labels: For the policy target you use a one-hot-encoded vector to replicate what move was actually played in a given position. By providing different target moves with varying frequencies the NN learns to find the best fit for all given targets and hopefully generalizes well to unseen samples. The policy is usually optimized using a cross-entropy loss.

The value target is also binary (or threefold, if a draw is possible). You assign a position as winning, drawn or losing depending on the final Monte-Carlo return (game result), regardless of the game progression. The value output is clipped into a numerical range of e.g. [-1,+1] using a Tanh activation or using Sigmoid if you want to use a range of [0,1] instead. The value loss is formalized as a typical mean squared error.

In supervised learning you usually have a broad diverse number of players in your data set which reduces the chance of over-fitting and missing key concepts of your variant.

In reinforcement you have to deal with a new dimension of complexity. Mainly finding the right trade-off between exploration and exploitation.

You're always at risk of zeroing out important lines. This happened to some extent to AlphaZero by almost never playing the Kings Indian Defence or Sicilian Defence (https://arxiv.org/pdf/1712.01815.pdf, p. 6). Generating your own samples as the new optimizing targets can also have some bad recursive effect or lead to forgetting of general knowledge. For example, if you observe the winning progression against the previous network during traing you might get fooled and end up in rock - paper - scissors scenarios, where the new network always learns to exploit the previous one without actually increasing overall playing strength. One prominent environment for this is Starcraft II, where almost any strategy can be exploited if you know it beforehand. As a consequence, DeepMind introduced a league system for comparing their agents.

In reinforcement learning your policy target is the final MCTS visit statistics distribution on which a temperature scaling is applied and the next move to play is sampled from. The cross entropy loss is therefore calculated using the full distribution. This is an important aspect because some DL-frameworks only support sparse cross-entropy by default. Moreover, you also gather the Q-values as an additional feature. Abrams et al. proposed to use a mix between Q-value and game result as your new value target (https://medium.com/oracledevs/lessons-from-alphazero-part-4-improving-the-training-target-6efba2e71628). This is similar to bootstrapping and requires an additional mixing hyper-parameter. Adding bootstrapping often accelerates progress, but comes at the risk of not fully utilizing your Q-value range if the Q-value ratio is too high. Besides that, Wu proposed to integrate auxiliary targets into the value loss (https://arxiv.org/pdf/1902.10565.pdf).

Using a network trained on classical chess games and later adjusting the policy distribution:

My assumption is that value evaluation of FFChess is quite similar to classical chess and the default policy distribution is also a good starting point. Therefore, without having to retrain a network using self-play or supervised learning you can first manually re-adjust the policy from the NN with some heuristics. As a simple baseline you could assign every self capture move a fixed value based on the current number of legal moves and renormalize afterwards. This can be used to get an idea of how the variant plays out and give some first results on feasibility.

For instance, CrazyAra has a boolean UCI option "Enhance_Checks" which manually increases the probability for checks which are below a given threshold to ensure that short forced mating sequences are detected quickly even when the raw policy has blind spot for it (https://arxiv.org/pdf/1908.06660.pdf, 9.2.7, p. 23).

You mentioned you used python-chess to prepare the training data, but then programmed and trained the engine with C++... can you tell me more about that process?

First this project was entirely written in python. The full preprocessing pipeline of converting pgn-files into plane representation makes use of the python-chess and numpy library. For compression we use the zarr library. Later the neural network can be trained supervised which is also written in python. If you want to use self-play, the most critical aspect in terms of performance is the runtime of your MCTS. First we implemented the MCTS in python only using numpy, python-chess and MXNet. Later the MCTS was rewritten into C++ which greatly improved speed, but took more time to implement. The C++ binary has additional non-UCI-standard commands which are selfplay <number> to generate <number> amount of games and export the training samples after each game. The arena <number> compare command, runs a tournament between the current model and a model contender in order to suggest if the current network should be replaced and to track progress.

Self-play is still the ideal though. Under the constraints of my current technical ability and time, I'm considering a shift to studying the effects of FFChess on famous endgame positions only. Any thoughts on how that might affect the feasibility of this Python-only project?

Endgame positions are typically neither a strong point for both classical A/B as well as neural network engines. They usually struggle at detecting fortress positions and even today some endgame positions are obvious to human expert players, but are hard to solve with A/B engines. Hence, almost any modern chess engine makes use of table bases. Analysing endgame position in FFChess could be an interesting research question. As an alternative, you could compare tactical position benchmarks when analysed in classical and FFChess.

QueensGambit commented 3 years ago

Hi @bkitej , I am not sure if this is still relevant to you, but there are two major back-ends (https://github.com/QueensGambit/CrazyAra/pull/59, https://github.com/QueensGambit/CrazyAra/pull/93) supported in CrazyAra now.

So you could define your variant friendly fire chess in one of them or implement your own environment from scratch using a blank State class.

There is also a model for standard chess available now which was trained on human standard chess games.