LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.36k stars 519 forks source link

Extracting parts of Lc0's search into classes would help future development #1734

Open Naphthalin opened 2 years ago

Naphthalin commented 2 years ago

As of now, Lc0's search is still using AlphaZero's original PUCT, with a few modifications in WDL+draw score, some certainty propagation, and the general batching strategy. This certainly works decently well, but one of the reasons why there has been a lack of progress in the search can be attributed to how difficult it is not only to test search changes (especially if they affect the scaling behaviour), but also how difficult it is to implement them in the first place. I would like to change the latter part with this proposal.

Currently, an experiment in the search code involves

This effectively renders casual experiments unfeasible, and on top requires a major effort in maintaining branches mergeable with master, which increases the difficulty of contributing to Leela's search.

I have identified the following aspects of Lc0's search code which IMO would profit from being extracted similar to how timemgr, as experiments in the respective fields would then only require creating the associated class file, and registering the new class in the respective file, without touching any of the above files or creating merge complications.

  1. Selection of move to play. Currently, we're using two strategies: most visits, and with temperature. There are other strategies of interest like sampling according to NN policies which are yet to be implemented.
  2. Selection of move to explore. Currently, this uses PUCT, with the addition of drawscore, and some batching logic in multigather and collision scaling. Here, a lot of alternatives are possible (and explored in the literature).
  3. Result propagation. Currently, this follows the general MCTS principle of averaging all outcomes, and calculating these averages in an incremental way. This would need some adaptation for the MCGS implementation at least, and in general when moving away from pure averaging.
  4. Node value recalculation. This doesn't yet exist, but in the case of leaving averaging behind (as done in most MCTS like PUCT and MCGS), and calculating the value of a node from the value of its child nodes e.g. as RENTS or my own #963 do it, this would need to be added, and multiple strategies are possible.

I will create a post for each of the 4 points, collecting (a) necessary functions for such classes, (b) spell them out at least in pseudo-code (or copy from the code where available), and (c) list all parts of the codes which need replacement.

Naphthalin commented 2 years ago

1. Selection of move to play.

Current strategies:

Additional strategies:

The class(es) would need to employ

This could be done either by looping over root's child nodes, or by giving an array with all relevant information. A few parameters would need to be handed over from search:

Naphthalin commented 2 years ago

2. Selection of move to explore.

This probably is the hardest of them, but also the most interesting one for future development.

Current strategies:

Additional strategies:

General remarks:

The class(es) would need to employ the following functions:

Naphthalin commented 2 years ago

3. Result Propagation

Current strategy:

Other strategies:

The class(es) would need to employ the following functions:

Naphthalin commented 2 years ago

4. Node value recalculation

This doesn't yet exist, with the small exception of reverting twofold draw visits to restore balance to the tree, though there are known side effects in fast TCs (#1466, fix attempt in #1513).

A node recalculation would be necessary in all strategies which can't be fully represented as incremental updates. In general, it means calculating Q/D/M of a node from its children, which effectively allows pruning of branches which turned out bad, and hence also allow exploration strategies which don't converge on mostly exploring the best moves. #963 and the mentioned betamcts-rents branch explore a version of "soft minimax recalculation", which interpolates between pure averaging at low nodes to pure minimax at infinite nodes, and allows for nearly arbitrary forced exploration as it happens in forward/backward analysis.

However, a big caveat is that this can lead to non-incremental changes in Q, which interacts badly with PUCT, as the latter relies on incremental updates in order for the PUCT score to (semi-)reliably spread the visits across moves. This means that likely neither a recalculation loop nor a replacement for PUCT will lead to an improvement without the other, and they need to be tested together.

The class(es) would need to employ the following functions:

danegraphics commented 2 years ago

There are definitely a large number of different paradigms that would be nice to explore and test. Making the code more modular in this way would definitely make it much easier to do so.

More specifically, being able to experiment more easily with the search, value calculation, and the integration of the two would help massively.

Currently, by the nature of MCTS, search is somewhat bound to the assumption that any move worth more exploration is also more worth playing, and vice versa, even though this isn't necessarily true, especially in chess. So the ability to experiment more easily with functions with aims akin to the aims of quiescence, pruning, minimax, and more without needing to overwrite code for existing strategies in the process is definitely something we should aim for.

I very much support this proposal.