Open MlSAKA-MlKOTO opened 3 weeks ago
Thank you for your insights. When it comes to extended nodes, we believe that both rollout and estimating the value of the current node, as well as the values of its child nodes, are acceptable approaches. It depends on which code is convenient. You can modify our code quickly to implement what you mentioned, because we have achieved this in our internal version. But we have not observed some interesting phenomenon.
Hi, thank you for your brilliant work!
I have a quick question about the value update in your method. Could you clarify why you don’t compute the value for each newly expanded child node immediately after they are created? In your code, it appears that value updates are only done for a node when it’s expanding new children. This seems a bit counterintuitive, as it means that newly expanded nodes will be selected based solely on their priors since they don’t yet have a computed value.