深入浅出看懂AlphaGo元

CharlesLiuyx commented 6 years ago

https://charlesliuyx.github.io/2017/10/18/%E6%B7%B1%E5%85%A5%E6%B5%85%E5%87%BA%E7%9C%8B%E6%87%82AlphaGo%E5%85%83/

CharlesLiuyx commented 6 years ago

评论区在此，欢迎各位读者批评指正交流！

1131602418 commented 6 years ago

拜读您的文章后，收获很多！不过也有点疑问，您在总结与随想中写到“使用深度神经网络的训练作为策略改善，蒙特卡洛搜索树作为策略评价的强化学习算法”，又在后文AlphaGo Zero 中的策略迭代法是如何工作的？中写到“策略评估过程，即使用MCTS搜索每一次模拟的对局胜者...策略提升过程，即使用MCTS搜索返回的更好策略 π",让我有点分不清到底策略评估和策略提升到底是什么。希望您有时间能解答下我的疑惑。谢谢！

lynshao commented 6 years ago

Hi,请问你的理解中那个Dirichlet noise是怎么加的呀. 我看原文里面写的是Additional exploration is achieved by adding Dirichlet noise to the prior probabilities in the root node s_0.这个意思是说,每次在做MCTS的时候root node各个edge的初始概率会加上一个Dirichet noise,是这个意思么

CharlesLiuyx commented 6 years ago

@1131602418 这两者都是属于强化学习的算法内容。策略评估是对现有的策略进行打分（看他有多好），策略提升，就是使用各种方法修改原策略获得更好的策略（目标函数的梯度下降等算法）。也就是迭代的过程（不断进行更新，字面含义）

CharlesLiuyx commented 6 years ago

@lintonshaw Dirichlet Noise就是在采样中添加一个偏置（偏离值），让采样值不要过于集中，扩大搜索树的范围

lynshao commented 6 years ago

@CharlesLiuyx 谢谢回复，其实我的意思是，是不是对于每一次MCTS，我们都要在root node的prior probability中加入Dirichlet noise尼？AlphaGo Zero原文中写的是在P(S_0)上加，这个s_0指的是不是就是每次MCTS的root node尼？还是说特指每个episode刚刚开始的第一个MCTS的root node~

CharlesLiuyx commented 6 years ago

@lintonshaw 这个意思！这我还真没有Go through 过？可以看看几个复现的代码，这部分是怎么处理的，我觉得理解为Episode的 Root Node可以，但是我不确定，毕竟没有开源代码嘛。能确定狄利克雷噪声的使用方法的话，基本就是K-fold Search了，炼丹走起。谁效果好就怎么处理可以，比如P(S_0) or P(S_0 to 10)

lynshao commented 6 years ago

@CharlesLiuyx 好的，谢谢啦~

dikpoorcat commented 2 years ago

神经网络的输入状态选择了使用历史八步是出于什么考虑？只用当前的场面信息也可以吧？是因为GO里面存在“吃子”吗

CharlesLiuyx / BlogComment

深入浅出看懂AlphaGo元 #23