aki65 / aki65.github.io

69 stars 12 forks source link

katago 20b and new optimized 40b running abnormally slow #61

Closed humble-stu closed 2 years ago

humble-stu commented 2 years ago

katago 10b and leela zero work normally on mi 11,snapdragon888,above 100 visits per sec,but katago 20b and 40b only have 10 visits per sec speed,bug or feature?

aki65 commented 2 years ago

I have a mi 11 myself, and I had the same issue after updating to MIUI 13. But this problem should be solved (at least it was on my phone) if you

humble-stu commented 2 years ago

我是用最新版的,但是没更miui13,还是12.5,需要更新吗?还是说要下旧版的? 还有就是搜索线程数,神经网络缓存大小,分析深广度保守激进度这些参数需要改吗?

aki65 commented 2 years ago

No, you don't need to update. When I had MIUI 12.5, I was using

and this combination worked well (so should also work for you).

If you don't need the built-in 20b network, BadukAI 1.15 with the 40b network from above should also work.

By the way, I'm sorry, I don't speak chinese. My answer in the other thread (which probably gave that impression) was created by google translate. So we should stick to english ...

humble-stu commented 2 years ago

LOL,Google translation works good. By the way, the computation speed on phone really surprises me, I run the same katago 40b on 1060, only about 200 visits/s or less. The numofsearchthread, nncachesizePoweroftwo, playingoutdoubleadvantage, just keep it default?

aki65 commented 2 years ago

Changing the cache size is only necessary if your device has not much RAM (which is of course not the case on Mi 11). PlayoutDoublingAdvantage should only be changed for handicap games.

For best performance, you should tick parallelEvaluation and set numSearchThreads = 2. This should yield around 80 visits/s for 40b. And if you are really keen on performance, you should upgrade to MIUI 13, because it is based on android 12 which brings the latest chipset drivers. Then you will get around 135 visits/s with the latest 40b network.

humble-stu commented 2 years ago

I upgrade to miui 13, seems it is easier to make mi11 overheat. I thought numofsearchthread bigger, performance better, at least on PC katago, not 2, curious. By the way, when I use analyze all to play manual, about 100 round plays, the program stuck, is it inevitable? P.S. Find some problem about AI predicted winrate, sometimes I play one 20% winrate hand on white, which means white's winrate should be 80%, then analyze black winrate suddenly fall to 68%. Continue to compute doesn't help, even up to 30k visit.

aki65 commented 2 years ago

I thought numofsearchthread bigger, performance better, at least on PC katago, not 2, curious.

According to my tests, adding more threads than 2 doesn't increase performance any more. Probably, this is because a smartphone NPU has much fewer cores than a PC graphics card (so that it is already kept busy by much fewer threads).

By the way, when I use analyze all to play manual, about 100 round plays, the program stuck, is it inevitable?

I never had that problem. Can you share the sgf where it happens ? And tell me what's the exact sequence of steps to reproduce that ?

Find some problem about AI predicted winrate, sometimes I play one 20% winrate hand on white, which means white's winrate should be 80%, then analyze black winrate suddenly fall to 68%. Continue to compute doesn't help, even up to 30k visit.

One move later KataGo also looks one move further ahead (on average with same number of visits). So it may detect a good move for white which it overlooked in the analysis of the previous position. And this may reduce black's winrate from 80% to 68%.

ugvvff commented 2 years ago

865处理器手机在更新安卓12也会有运行速度的提升吗?

humble-stu commented 2 years ago

Not a specific sgf, just any random analysis, seems the APP can't support long analysis, or it will crash. The compute speed for 40b fall to 60 visits/s these days, inexplicable. It is not persuasive to me, it is a loacl live or dead problem

aki65 commented 2 years ago

I analyzed some complete games, but could not reproduce the crash. Perhaps there is a memory problem after long analysis, so you could try reducing the cache size by setting nnCacheSizePowerofTwo = 10.

The compute speed may indeed decrease during a long analysis due to thermal throttling.

aki65 commented 2 years ago

不,我不期望如此。865出现在安卓10中,所以所有的驱动优化和错误修复都将在安卓11中完成,所以在安卓12中没有进一步的改进可言。

ivysrono commented 2 years ago

@aki65 Redmi Note 11T Pro which with Dimensity8100 and MIUI13 no NPU works by 1.15.0 and 1.15.2, only about 22n/s.

aki65 commented 2 years ago

As NPU acceleration works on Dimensity 1100 and Dimensity 9000, this is clearly a bug in the NPU driver of Dimensity 8100. So we can only hope that this is fixed with an upcoming firmware update. Since the phone is so new, there will surely be some of those in the near future.

ivysrono commented 2 years ago

As NPU acceleration works on Dimensity 1100 and Dimensity 9000, this is clearly a bug in the NPU driver of Dimensity 8100. So we can only hope that this is fixed with an upcoming firmware update. Since the phone is so new, there will surely be some of those in the near future.

Need to check 3 or 4 .