CuriosAI / sai

SAI: a fork of Leela Zero with variable komi.
GNU General Public License v3.0
103 stars 11 forks source link

sai 0.18.1 vs 0.17.6 (my impression) #142

Closed Cabu closed 3 years ago

Cabu commented 3 years ago

I don't have really hards numbers but here I am. My machine is a laptopt with a i7-8750H with a RTX 2080 and 32GB and the whole heavily capped by bad thermal dissipation (ie: don't buy a MSI GE75 Raider).

With 0.17.6 I maxed my 2080 with 4 threads with my CPU at about 25-30% usage. A game took around 120 seconds depending how much internet/work I was doing.

With 0.18.1, I need 8 threads to get my videos card at about 90% usage with my CPU at 90%. A game still take 110 seconds when doing nothing. BUT, I need to stop SAI to do anything else because my machine is on its knees :( Running 4 threads, a game take about 250 seconds...

Why 0.18.1 is softer on the video card usage than the previous version?

Vandertic commented 3 years ago

This is a very strange behaviour and a very good question. We will have to look into this matter. I didn't notice the problem myself, but I will check ASAP. Thank you.

Vandertic commented 3 years ago

Hello @Cabu we have done a couple of experiments, one with a setting similar to your own and cannot reproduce this problem. In particular, we find the same game duration when using the same number of threads, between the two versions. Could you please do some more experiments and give us some other details? For example the average seconds/game and ms/move after say ten self-plays with the two versions? Thank you

amato-gianluca commented 3 years ago

Hello @Cabu , I have a question: are you using Windows or Linux ?

Cabu commented 3 years ago

@Vandertic Sure. I will do a complete run of the 2 versions and do some screen copy/paste of SAI display. I will take some time to do so. But I will come back :)

@amato-gianluca It run on Windows 10.

Cabu commented 3 years ago

Here are 4 threads 2h long tests with the full sai output. I can do a 8 thread tomorrow if you need it.

SAI 17, 4 threads

~2h test, 92 games played

GPU: CUDA: 99% Copy: 20% Mem: 1.5/8 Go

CPU: Usage: ~60%

SAI 18, 4 threads

~2h test, 39 games played

GPU: CUDA: ~40% Copy: 8% Mem: 0.9/8 Go

CPU: Usage: ~30%

full screen output: sai 17 4.txt sai 18 4.txt

Vandertic commented 3 years ago

This is a mistery to me. @amato-gianluca do you have any idea?

Vandertic commented 3 years ago

@Cabu can you do a fast check with version 0.18.0, to see if the problem appears there or in 0.18.1?

Cabu commented 3 years ago

@Vandertic I didn't tried 0.18.0. Doing it now.

Cabu commented 3 years ago

@Vandertic I didn't tried 0.18.0. Doing it now.

Cabu commented 3 years ago

SAI 18.0 4 threads

~2h test, 77 games played

GPU: CUDA: 98% Copy: 21% Mem: 0.9/8 Go

CPU: Usage: ~60%

full screen output: sai 18.0 4.txt

Vandertic commented 3 years ago

Awesome. So the problem must be between 0.18.0 and 0.18.1. Luckily there are not many changes between the two versions, the main ones being fpu average and shared mutex.

Cabu commented 3 years ago

Yes, but there is still a little something that bother me. 18.0 use less video memory and played 15 less games... But yes, it's way less than 18.0 vs 18.1 :-) In the meantime, I will go back to sai 17 for crunching games :)

Ho, if you need me to test specific binaries to pinpoint the problem, I am available to do so :)

amato-gianluca commented 3 years ago

Dear @Cabu, could you please try the new build of SAI you found attached? I have reverted some changes that might be the cause of the slow down.

sai-0.18.2-gpu.zip

Cabu commented 3 years ago

SAI 18.2 4 threads

~2h test, 92 games played

GPU: CUDA: 98% Copy: 21% Mem: 0.9/8 Go

CPU: Usage: ~65%

sai 18.2 4.txt

Cabu commented 3 years ago

Now the last question is: Why on my computer these changes make it slower and not on yours? Can we create a performance test at the start of the engine or a parameter activating/deactivating them is there is no penalty?

amato-gianluca commented 3 years ago

Difficult to say, we should investigate further, it could be a timing issue due to the way shared mutex were implemented. For the moment, we removed them entirely, since we have not seen any system which was showing performance improvements with them.