glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
759 stars 298 forks source link

Learning from syzygy #609

Open ASilver opened 6 years ago

ASilver commented 6 years ago

I know there has been talk of using syzygy, but the context of it was never clear, so I apologize if this is a rehash. First I want to be clear this is NOT about using the TBs in the search. Ever. I completely understand that there is a reluctance to allow any supervised training, since it might create biases that impinge on Leela's natural learning, but this is different:

Use plain 6-piece syzygy when such a position is reached, but play all the moves until the end. Syzygy, and any tablebases are not subject to opinion: they are perfect play at all instances. The point is to reach a 6-piece endgame, when one is reached, and let the rest of the moves be perfect, allowing Leela to learn from them! This will allow her to not only understand how to win, or draw those positions, but also to recognize, as a GM would, which endgames to steer for or avoid.

john45678 commented 6 years ago

A very interesting idea, But would this be able to be done on a separate (forked) edition of LC? To keep the main version totally 'zero' all the way.

Ishinoshita commented 6 years ago

@ASilver "The point is to reach a 6-piece endgame, when one is reached, and let the rest of the moves be perfect, allowing Leela to learn from them! " (1) This is no more Zero spirit, you would spill the beans in hope of moving faster to a stronger network. I get your point in considering endgame tables as perfect thus non-human knowledge, that humans just discovered, as they discovered prime numbers, etc..., but still, this is no more 'self-discovered' knowledge.

(2) Endgame tables would propose a one way solution not a probability distribution over move (kind of one-hot vector). Network would have to learn 0 probability for other possible winning moves needing more plies to reach a win. This might conflict with the rest its training methodology, where it's trained only toward winning, not towards winning in the minimum number of plies.

I'm not really conviced (2) might be such a real issue. More concerned to see the project depart from (1). Once LCZ has reached something close to a final state (asymptotic Elo, on a max reasonable NN size = negative return on strength per unit of time when NN size is increased, on a max number of visits, etc. ....), then cloning the project and toying with it in many directions will fully make sense.

Ipmanchess commented 6 years ago

You can also see it as..if LCZero learn so well that she can be come ahead these 6-7men syzygy tablebases.. at that time 6-7syzygy don't see the problem coming as LCZero would have a knowing 10men solution playing for example.

ASilver commented 6 years ago

@Ishinoshita Aside from some structural changes, the biggest change in AlphaGo Zero was the tabula rasa, but not because of some philosophical idea, but because it broke away from human biases. The version that beat Lee Sedol was still strongly influenced by human knowledge, and thus human biases, having not only started with a large base of human master games to learn from, but using that net as a reference. AlphaGo Zero surpassed it because it got rid of this imperfect bias and could truly learn on its own. Endgame tablebases suffer no such human biases as they are perfect knowledge. The experiment of learning from zero has already been proven, and even the DeepMind team admitted there were plenty of avenues to improve, such as search algorithms or other. I don't see why learning from perfect knowledge would be an issue. I would have an issue with 'needing' the tablebases to play endgames well at all, but in this case we are simply giving it perfect knowledge to learn from. It is still up to it to learn.

tranhungnghiep commented 6 years ago

It would require great care and attention to design a good strategy for learning with table base.

Some obvious ways come with many caveats. Directly supervised learning from TB would have some drawbacks including unbalanced network exploration (thus understanding and strength) in endgame vs other parts of the game, because although TB is perfect play, remember it exactly is bad. Other option like playing to reach a position in TB is also bad, because Leela is told the position is win or lose but doesn't understand why. Even if after reaching the position we let Leela self play is also bad, because there is a gap in knowing and understanding. I discussed some of these in more detail in #196.

ASilver commented 6 years ago

It could not possibly remember it all, since that would simply mean reproducing the whole tablebase, which in this case takes up 160GB. If the engine can somehow derive a perfect set of rules or methods to perfectly play 6-piece tablebases in 90MB.... well, time to publish a paper!

In any case, I don't see how giving it a set of perfect moves is any different than feeding it any other supervised training games, except that in this case, we are certain the moves it is studying have no prejudicial biases.

tranhungnghiep commented 6 years ago

The bias is not only human induced bias, but also in the TB itself. The distribution of TB is likely different from the self-play end games (endgame patterns) and from other parts of the game (whole game patterns). As you said, TB is too large for the NN, and it is just a tiny part of the game, thus smooth transfer learning is very important here, I shouldn't repeat myself in other issue's comments. Other people may explain better than me, like @Ishinoshita 's point about moves' distribution.

dubslow commented 6 years ago

LeelaZero (Chess or Go or otherwise) will never be trained with a tablebase, since this violates the Zero principle. (Tablebases aren't "human" knowledge, but Zero means no outside knowledge whatsoever, human or not. We are aiming to have everything learned by emergent phenomena of the training, no guidance.)

chara1ampos commented 6 years ago

I fully agree with @ASilver, this would be a great idea. I have proposed something similar in the Discord. One could e.g. have Lc0 self-play vs herself, with an identical network, one with tablebases, and one without. The version that does not use tablebases would gradually acquire better endgame knowledge this way. I am conscious about potential non-smoothness issues, pointed out by @tranhungnghiep, but we won't know unless we try.

oscardssmith commented 6 years ago

Another option would be to intermix training with tables and training against tables, ie spend 80% of time learning self play that uses tables to see speed up searches, and the other 20% of the time using tables as ground truth and testing Lela's abilities to find the best moves.