2Bear / othello-zero

An implementation of the AlphaGo Zero and the AlphaZero algorithm for othello playing.
MIT License
21 stars 3 forks source link

Can't win edax level 1 #2

Open ixanezis opened 4 years ago

ixanezis commented 4 years ago

Didn't dive a lot into details, but following README, I've ran othello-zero v117 against edax level 1, and othello-zero lost miserably.. It didn't even capture one corner.

  A B C D E F G H
1 ●─●─●─●─●─●─●─●
2 ●─●─●─●─●─●─●─●
3 ●─●─●─●─●─●─●─●
4 ●─●─●─●─○─○─○─○
5 ●─○─●─●─●─●─●─○
6 ●─●─●─●─●─●─●─●
7 ●─●─●─●─●─●─●─●
8 ●─●─●─●─●─●─●─●

Same with level 2. It did however win level 0, and somehow level 3, but not level 1, 2 and 4+.

Considering these facts, I'm unfortunately not quite sure about "othello-zero is close to Edax Lv.5" statement.

I also wonder, if you have been checking othello-zero vs edax on numerous various openings, not just the default one.

2Bear commented 4 years ago

●Black is othello ○White is Edax

  1. The newer othello version is not always stronger than older one (This is the biggest difference between AlphaGo Zero and AlphaZero). In fact, V067 is much stronger than V117. I will upload all checkpoints from V001 to V117, if you need them.

  2. "Is stronger" doesn't mean "Win always", but "Win mostly". But for some reason, you can only get exactly the same game when othello's and Edax's level is same. For example, if you let othello V067 play with Edax Level4 a hundred times, these a hundred games are identical. I make a new commit, which enable the ''randomness" feature of Edax. So you can get different games every time, and compare othello and Edax more fairly.

  3. Thank you for trying this old and abandoned project. I make it just for fun, not for academic purposes. Some conclusions may be too hasty, But I believe the main algorithm works.

Best.

ixanezis commented 4 years ago

●Black is othello ○White is Edax

Sorry, I forgot to say that on my black terminal the colors are inverted, so I have inverted them in your code, otherwise black looked white and vice versa in my terminal and made brain damages to me. So edax is ● here (and it looks white in my terminal). And it is being run on level 1 here.

In fact, V067 is much stronger than V117.

Ok, I'll try to check it out.

Thank you for trying this old and abandoned project. I make it just for fun, not for academic purposes. Some conclusions may be too hasty, But I believe the main algorithm works.

Thank you for a nice and easy getting started into alpha zero. I'm just a bit disappointed that it's a lot weaker than I wished it would have been. Especially after the fact that edax descends to levels 21-23 in no time...

BTW, I've made a few tiny improvements (at least I think they are): https://github.com/ixanezis/othello-zero/blob/master/othello.py#L286 Instead of EdaxGame, HumanGame which where almost identical, I made a little more generic play_game(player) function, which accepts either EdaxPlayer, HumanPlayer or anything else, such as my own simplest AlphaBetaPlayer, which lives in https://github.com/ixanezis/othello-alpha-beta and by the way also wins v117 starting with depth 3 (it takes 0.03s to solve up to such depth). I can make PR into your repo with those improvements if you believe this is worthy. I also added a Dockerfile, so one can build and run the code inside a container easily.

2Bear commented 4 years ago

New commit which enable the ''randomness" feature of Edax is online. New release containing all checkpoints is online.

I can make PR into your repo with those improvements if you believe this is worthy.

Of course. Thank you so much.

I'm just a bit disappointed that it's a lot weaker than I wished it would have been.

I totally agree with you. I don't think othello-zero V117 or V067 is a invincible beast. Instead, othello-zero is a little baby. Alpha Zero trains itself to play Chess with over 40 million self-play games. For othello-zero, the number is only 0.2 million(V117). 200 times larger. It will take me about five years to complete the whole training, on my personal computer.

If I have enough computing power, I want to change some config, and rewrite part of train-loop code. Make othello-zero’s algorithm more robust. V117 should have been stronger than V067.