Georg-S / AlphaZero

An implementation of the Google Deepmind AlphaZero algorithm and some games to test it.
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Have you been able to create a somewhat decent chess engine? #4

Closed zjeffer closed 1 year ago

zjeffer commented 2 years ago

Hi, I've created a chess engine also based on AlphaZero (https://github.com/zjeffer/chess-deep-rl-cpp) in C++ after my Python chess engine was too slow to create data (https://github.com/zjeffer/chess-deep-rl). The C++ version didn't really manage to train very well either, even after many days of letting it play games against itself for the dataset.

I'm now creating a connect-4 engine with the same algorithm, because that should train much quicker, but I haven't really gotten very far with that either.

I was wondering what your findings are when trying to create a chess engine. Is it learning at all? Do you see any strategy or is it still playing very random? Do you have any loss graphs you could share? How about the other games you implemented?

I'm also interested in your master thesis. Did you publish it yet? I'd love to read your research.

Georg-S commented 1 year ago

Hi, as far as my results go: Tic-Tac-Toe the Algorithm reaches perfect play after 1-3 training iterations. In Connect four (takes a lot longer to train than Tic-Tac-Toe) AlphaZero yields pretty good results as well, it wins most of the games against my Mini-Max AI with a depth of 6 (if i remember correctly). However i couldn't see the Algorithm improving in Chess. This is most likely just an issue with the amount of training and the amount of monte carlo tree searches.

I want to retest the Chess training once i have a better GPU (right now i still have a GTX 970).

Regarding my thesis, it is written in German so it probably won't help you I guess?

zjeffer commented 1 year ago

How many games do you train on per iteration for tic-tac-toe? And how many games do you estimate you needed to play for Connect Four? With my engine, I already created 4000 games and when training with these games, the loss doesn't seem to drop much (especially the policy loss).

Regarding my thesis, it is written in German so it probably won't help you I guess?

I'm from Belgium, Dutch is kind of similar to German so I might be able to read some of it :)

Here's my thesis, mostly about the Python version of my engine: https://github.com/zjeffer/howest-thesis

Georg-S commented 1 year ago

For tic-tac-toe I use 1000 games per iteration and 50 mcts. For connect-four I use 1000 games per iteration and 100 mcts. My thesis unfortunately is not up to date with the source code. For example my approach to multi threading is missing. But most importantly, the results evaluated at the end of my thesis are way worse than what is currently the case with the current version. I had a small oversight with a pretty big impact ... I forgot to set the neural net to "eval" before using it for evaluation.

zjeffer commented 1 year ago

Thanks for the info.

I'm currently having a problem where sometimes after training the new model assigns very low scores to certain moves (especially at the start of the game), causing them to never be picked (I guess because of overfitting). For instance, with connect4, 7 moves are possible, but MCTS might visit moves 1 and 3 50 times each, and have 0 visits for the other moves. The same thing happened with my chess engine, which is why I pivoted to connect4. Did you come across this problem as well?

I've tried messing with the batch size (trying values from 128 up to 1024) and learning rate (from 0.1 up to 0.0001) and sometimes this helps. I'm probably going to implement Dirichlet noise as well...

Georg-S commented 1 year ago

I have not had this issue.

But i can think of two issues that can cause this problem:

zjeffer commented 1 year ago

I'll look into these issues, thanks!

zjeffer commented 1 year ago

I think I figured it out: after training, the network does indeed return very low probabilities for most moves (0.003 for example), resulting in this situation:

image

The Q values are so low it prints as 0, and the Upper Confidence Bound doesn't do enough to compensate for this low Q value, resulting in the move never or very rarely being played.

zjeffer commented 1 year ago

OH MY GOD I'M BLIND

https://github.com/zjeffer/connect4-deep-rl/blob/main/src/game.cpp#L150

void Game::updateMemoryWithWinner(ePlayer winner)
{
    // update memory with winner
    for (MemoryElement& element: m_Memory)
    {
        element.winner = static_cast<uint8_t>(winner);
    }
}

When saving the data, I set the winner to a value between 0 and 2, instead of -1 to +1. Because of this, after training all value outputs are 1 :/

1000's of useless games created lol