Chapter 18: difference from original paper

Hi!

Original paper uses L2 regularization, but you don't (train.py, line 64).

The main reason is that I've missed this point in the paper :). So, it could be a good experiment to check the effect of L2 regularisation on final result.

Your code retains all sub trees during an episode, but original paper says that sub trees are retained only if the are children of the selected action.

Hm, that's interesting. I remember this part of paper (it's at the bottom of the page 26), and from my perspective it contradicts itself, as it says "The search tree is reused at subsequent time-step", but then "while the remainder of the tree is discarded".

From my perspective, discarding parts of the tree (even if they haven't been used in this search) is harmful for MCTS statistics, so, I've decided to keep all the nodes and prune the tree only when new best policy is elected. Probably, I'm wrong (or just didn't understand this part of the paper).

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Chapter 18: difference from original paper #16