Implement optional TB support

killerducky commented 6 years ago

I propose we add optional TB support in LZChess. It would never be used for self-play games, and default off for match games. This way LZ still has to learn from zero how to play all endgames, and for matches those who want to keep a pure Zero approach can. But for those who want max performance in match games, they can turn it on.

Here is a mockup of what UCTSearch::play_simulation would be like:

        if (drawn || !MoveList<LEGAL>(cur).size()) {
            float score = (drawn || !cur.checkers()) ? 0.0 : (color == Color::WHITE ? -1.0 : 1.0);
            result = SearchResult::from_score(score);
        } else if (cfg_use_tb && tb_hit(bh)) {   // <<< These two lines are new.
            result = SearchResult::from_tb(tb_score(bh));  // <<< These two lines are new.
        } else if (m_nodes < MAX_TREE_SIZE) {
            float eval;
            auto success = node->create_children(m_nodes, bh, eval);
            if (success) {
                result = SearchResult::from_eval(eval);
            }
        }

This treats tb_hit() and tb_score() as rules of the game.

jjoshua2 commented 6 years ago

It needs a use syzygy checkbox (that can be default off) and syzyygypath and count, just like SF has. Although I think we should distribute 3-4 piece TB with it, since they are smaller than our weights.

davidsoncolin commented 6 years ago

What about a slightly different idea: to train a zero-style network solely on endgame positions to see whether the tablebase can be replicated as an algorithm? Or, perhaps there a way to use the tablebase itself as training data?

jjoshua2 commented 6 years ago

@davidsoncolin It would be possible to use syzygy dtm tables to reward the shortest dtm as the best move, instead of the move found by the playouts (or dtz tables for that matter but less elegant). But that would be a fair bit of work. It would be a lot easier to just allow resigning, and maybe use TB to help with that decision.

evalon32 commented 6 years ago

Does "max performance" refer to speed or quality of play here? This will certainly help with the former, but I think it can also affect game outcomes.

jjoshua2 commented 6 years ago

match games have a TC so any speed increase also translates into quality of play increase, so primarily its a elo increase regardless, but that's normally what people aim for in match games anyway.

evalon32 commented 6 years ago

Ah ok, I was thinking of match games as in http://lczero.org/matches.

killerducky commented 6 years ago

@evalon32 for official promotion matches I propose we leave TB off, to make sure the NN is learning the entire game all the way to checkmate.

BTW reckon there will be some people who want to use TBs all the time, including matches and self-play. Personally I wouldn't have a problem going even that far, but I think optional TB is a good compromise for those who want LZ to learn the game all the way to checkmate on it's own. I also think it's fun to see it struggle and finally learn some of the basic checkmates! :)

ddugovic commented 6 years ago

I propose that syzygy WDL tables be bundled w/Leela, offering rudimentary knowledge about which endgames are won/drawn/lost, but not how to win them.

HenryHeffan commented 6 years ago

What if Leela was trained explicitly on endgame from tablebases.

I think that this does not break the spirit of Leela as tablebases are not really human knowledge. They follow and are generated directly from the rules of chess. Could Leela be trained in a supervised learning fashion to give an output of 1 or 0 or .5 for won, lost, and drawn endgames respectively? If so, would this violates the idea of LZ?

tranhungnghiep commented 6 years ago

Training on tablebase would come with a caveat, the network would memorize the tablebase exactly, instead of learning to foresee the value of any position in the game without deep tree searching. Training with tablebase is doable and maybe useful to speedup training, but it requires care for a good plan for smooth transfer learning.

Self play currently does this, it explores the end game and reconstructs its own tablebase and encodes in its weights, at the same time it explores and encodes other parts of the game in its weights, so the learning is smooth for all parts of the game.

Videodr0me commented 6 years ago

Would such a scheme be a viable idea (to avoid deployment of TB to all clients generating training games): (1) Play training games, but not to conclusion if 6 pieces are left on the board (2) The training pipeline adjudicates all these games with 6 piece TB before learning begins (3) For match play leela would need access to 6p TB, because she would never learn how to win them on her own

While (3) might seem off putting, the plus side look tempting:

(1) more training games in less time (shorter games) (2) quality of training will be higher as the adjudication is "perfect" and learning always goes in the correct direction (3) the net has more capacity to learn important structures as it does not have to bother with detailed 6p ending knowledge

Any thoughts?

Ipmanchess commented 6 years ago

Real game play..engines doesn't know wrong color Bishop when opponent pawn can promote..you need same color Bishop to stop this pawn to keep draw.. This situation you get for example when still 20pieces on the board..engines goes 30-40moves in depth and they see when i exchange my pieces i have a Bishop in advance and end up with a eval +2.55 ..and when he comes there ,then of course to late to see it's draw as he has keep the wrong Bishop. So these engines don't see that..there are many situations like that,then are these syzygy TB's very usefull to see in time when i exchange all these pieces it will be draw and not thinking he is winning.And engine can also see or prevent to come in these kind of positions. Tablebases are also very helpfull to prevent losses on time in sudden death games..i played always 5min.games and when they come into these endgames most time it's already decided which result it gone be..and i didn't have Time Forfaits anymore..or engine has TM problems.. I know you want LCZero learn from zero..but with a fork LC0 can learn and see much in advance when coming into endgame what result he will get ,then maybe he can change his game plan that lead to a won game..or can lead to a draw game when else it was lost without TB's! Tablebases are perfect..there is no other solution that lead to best result..so there is nothing wrong to learn from something that is already perfect solved! LCZero is for which goal: 1.Just a fantatic project to see how NN works for chess and can be used for other applications 2.To become strongest chess engine in the world.

Ipman.

tranhungnghiep commented 6 years ago

I think the point is, it is not easy to integrate TB in training. Chess complexity is much higher than the NN's capacity, which is used for the who game. Putting too much emphasis on TB could make Leela remember the TB exactly with compromised strength in mid-game for example, because the same set of weights is responsible for detecting all "good" strategy/pattern for the whole game.

That being said, I think TB could be used as fine-tuning in the later training phase when Leela is very strong already, so not to affect the whole NN too much. This is a typical transfer learning strategy, by supervised learning from "correct" end game move we hope to perfect its 6 pieces end game, but of course it could not be completely perfect, same capacity problem.

Bernize commented 6 years ago

Hello, I think using any helpers during training or in matches, are against of nature of this project. If not, you can train against latest stockfish to get better sooner. I thought that this project is about selflearning. And endgame is part of game. As well as opening and midle game. I hope, you understand, what I'am trying to express. Horrible english...

jjoshua2 commented 6 years ago

Currently egtb is disabled for self play and gating matches. It is only for people who want it and enable it personal matches right now. The problem with trying to learn from sf is it is a human made eval that is not perfect so using it will introduce systematic mistakes. Tablebases have no human knowledge or heurstics but are purely made from the rules of the game. They can be easily made for any game where the number of pieces or free squares diminish over time using retrogade analysis of all possible endings. So there is no possibility of systemic errors being introduced or even game domain specific knowledge besides rules

On Thu, May 3, 2018, 7:16 AM Bernize notifications@github.com wrote:

Hello, I think using any helpers during training or in matches, are against of nature of this project. If not, you can train against latest stockfish to get better sooner. I thought that this project is about selflearning. And endgame is part of game. As well as opening and midle game. I hope, you understand, what I'am trying to express. Horrible english...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/196#issuecomment-386261911, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INFQXVQeUkHhwh6hztJYBZJGr8tuAks5tuubwgaJpZM4S9dzj .

Bernize commented 6 years ago

O.K. you are right. I missed just option to play against human. And also o.k. with stockfish. But I think who want to play with Leela, is because they want to play with pure AI, so EGTB or Opening books will most people disabled. But O.K. if you mean it as an option for not improvement... Aren't EGTB made by brute force? And wasn't there some fault in some engine on TCEC or somewhere else? When engine relying on them played thinking is winning (based on tb) and lost? Another question is if Leela will be error free. With cpu and gpu power will be better and better, but probably there is some limit. Doesnt progress shows that? Math isn't my cup of tea, but probably someone calculated some values from progress graph. Sorry, for missunderstanding of purpose to have option using EGTB.

ddugovic commented 6 years ago

The trade-off is increased code complexity versus increased training time (re-calculating tablebase positions which have been calculated already). It's possible to create FUD on any topic.

killerducky commented 6 years ago

v0.8 has TB support now.

glinscott / leela-chess

Implement optional TB support #196