Closed foxrainowo closed 7 months ago
在 Playouts 数量相同的情况下,我在 BadukAI 权重和原始权重之间进行了数百场比赛。而 BadukAI 的权重要弱 30 ELO(由于采用了 8 位量化)。这只是很小的差别。 从统计学角度来看,一场比赛并不能说明什么。
The same issue I find recently? With the same playouts and the same model, Phone Android BadukAI maybe a little weaker than Laptop Linux Sabaki and would lose more games. Is that ture? Or some issue just caused by statistics?
The last thorough comparison, which I did, was run with the s658 network. So, in theory, it's possible that the latest networks behave a little worse in terms of quantization inaccuracy, but it's pretty unlikely given that they have the same architecture and the same methods were used.
With s658 I ran 400 games which ended 219:181 in favour of the original network. But there was a high variance: Within this match there were streaks of 20 games which ended 16:4 for one side or the other. So we really should not not draw conclusions from small numbers of games.
Perhaps I should clarify some parameters I used in these matches (just in case you did something very different):
I use 300 playouts: This is the order of magnitude that is also used in KG's ranking games. Using much less would be unrealistic (after all the strength of the network is mainly relevant when used for analysis purposes, and in this case you would always use at least several hundreds of playouts, otherwise there is no reading).
I use two search threads for BadukAI's network: Using more doesn't increase performance (since the NPU is essentially single-threaded) and may have negative effects on the search.
I use 7.5 komi: BadukAI's network was calibrated with that number (since it's KG's default), so the quantization inaccuracy may be a little higher with other values.
我使用手機版的權重和電腦版的權重進行對局,在同一局面下,相同計算量下,手機版明顯比電腦版弱。
有沒有方法改善呢?