CuriosAI / sai

SAI: a fork of Leela Zero with variable komi.
GNU General Public License v3.0
104 stars 11 forks source link

SAI periodically disconnects in sabaki #123

Open cryptsport opened 4 years ago

cryptsport commented 4 years ago

0.17.5 works without problems. but 0.17.6 when playing against lz periodically stops and you have to start the match further

cryptsport commented 4 years ago

has anyone come across this? maybe the problem is sabaki?

cryptsport commented 4 years ago

launched in q5go - same problem. 0.17.5 works, 0.17.6 not. and if I want to play against sai, the same. does this problem exist? or am I doing something wrong?

Vandertic commented 4 years ago

@cryptpark can you give more information? For example command line in Sabaki for example?

cryptsport commented 4 years ago

Yes of course! now did it again: Sabaki 0.51.1, sai-0.17.6-gpu, networkfile a8e32bb8, --gtp --noponder -w networkfile.gz

Thinking at most 36.3 seconds... NN eval=0.497300. Agent eval=0.497711 (lambda=0.300, mu=0.030) cpus=10 Playouts: 25, Win: 52.85%, PV: R16 Q4 D4 Playouts: 77, Win: 51.93%, PV: D4 Q16 Q4 F16 R17 Q17 R16 Playouts: 120, Win: 51.79%, PV: Q17 Q4 D4 Q16 R16 R15 R17 Playouts: 178, Win: 51.66%, PV: R16 Q4 D4 Q16 Q17 R15 R17 Playouts: 232, Win: 51.58%, PV: Q17 Q4 D4 Q16 R16 R15 R17 Playouts: 316, Win: 51.59%, PV: Q17 Q4 D4 Q16 R16 R15 R17 Playouts: 402, Win: 51.61%, PV: Q17 Q4 D4 Q16 R16 R15 R17 Playouts: 557, Win: 51.67%, PV: Q17 Q4 D4 Q16 R16 R15 R17 F16 Playouts: 670, Win: 51.71%, PV: Q17 Q4 D4 Q16 R16 R15 R17 F16 Playouts: 742, Win: 51.72%, PV: Q17 Q4 D4 Q16 R16 R15 R17 F16 Playouts: 825, Win: 51.74%, PV: R16 Q4 D4 Q16 Q17 R15 R17 Playouts: 899, Win: 51.74%, PV: Q17 R4 D4 Q16 R16 R15 R17 O4 Playouts: 1007, Win: 51.74%, PV: R4 R16 D4 Q4 Q3 R5 R3 O16 Playouts: 1114, Win: 51.76%, PV: R4 R16 D4 Q4 Q3 R5 R3 O16

Q17 -> 112 (V: 52.29%) (LCB: 51.81%) (N: 5.74%) (A: -0.7) (B: 0.13) PV: Q17 R4 D4 Q16 R16 R15 R17 O4 R4 -> 122 (V: 52.19%) (LCB: 51.76%) (N: 6.36%) (A: -0.7) (B: 0.13) PV: R4 R16 D4 Q4 Q3 R5 R3 O16 R16 -> 104 (V: 52.20%) (LCB: 51.68%) (N: 5.69%) (A: -0.7) (B: 0.13) PV: R16 Q3 D4 Q16 Q17 P17 R17 F16 C4 -> 121 (V: 52.11%) (LCB: 51.66%) (N: 5.04%) (A: -0.7) (B: 0.13) PV: C4 R16 Q4 D4 D3 C5 C3 O16 D3 -> 118 (V: 52.10%) (LCB: 51.65%) (N: 5.19%) (A: -0.7) (B: 0.13) PV: D3 Q16 R4 D4 C4 C5 C3 P3 Q6 Q3 -> 106 (V: 52.04%) (LCB: 51.55%) (N: 6.37%) (A: -0.6) (B: 0.13) PV: Q3 R16 D4 Q4 R4 R5 R3 O16 Q4 -> 153 (V: 51.60%) (LCB: 51.13%) (N: 9.99%) (A: -0.5) (B: 0.13) PV: Q4 R16 C4 D4 D3 C5 C3 O16 E17 Q16 -> 116 (V: 51.53%) (LCB: 51.01%) (N: 8.24%) (A: -0.5) (B: 0.13) PV: Q16 D3 Q4 F16 C5 F4 C8 R17 R16 D4 -> 113 (V: 51.47%) (LCB: 50.92%) (N: 7.69%) (A: -0.5) (B: 0.13) PV: D4 R16 R4 Q4 Q3 R5 R3 O16 D16 -> 62 (V: 50.19%) (LCB: 49.16%) (N: 9.00%) (A: -0.0) (B: 0.13) PV: D16 D17 E17 D15 E16 C17 Q16 Q4 D4 J17 F14 D13 L17 C3 -> 8 (V: 50.20%) (LCB: 46.27%) (N: 0.98%) (A: -0.0) (B: 0.13) PV: C3 Q16 Q4 F16 R17 Q17 R16 R17 -> 7 (V: 49.98%) (LCB: 42.60%) (N: 0.95%) (A: 0.1) (B: 0.13) PV: R17 Q4 D4 F16 R3 Q3 R4 R3 -> 7 (V: 49.67%) (LCB: 43.31%) (N: 1.07%) (A: 0.2) (B: 0.13) PV: R3 Q16 D4 F16 R17 Q17 R16 E17 -> 6 (V: 49.43%) (LCB: 38.01%) (N: 0.95%) (A: 0.2) (B: 0.12) PV: E17 Q4 D15 C15 D14 C13

Root -> 1157 (V: 51.79%) (LCB: 51.54%) (N: 0.00%) (A: -0.6) (B: 0.13)

6.2 average depth, 15 max depth 929 non leaf nodes, 1.24 average children 1157 visits, 405223 nodes, 1155 playouts, 31 n/s

and that is all! nothing on the board! sometimes after several moves, sometimes, like now - after the first. EDIT I noticed a difference: in 0.17.5 after lz-genmove_analyze W 50 "= info move (...)" and move, but in 0.17.6 these lines are missing

Vandertic commented 4 years ago

Uhm, I tried to reproduce your bug, without success. I don't suppose you are using a peculiar gpu or gpu driver? Because it is known that some OpenCL drivers can be broken and this sort of things could happen. In particular: the output of lz-genmove_analyze is the same for both versions (apart from the field areas added to 0.17.6) and the fact that info move does not appear means that the search has crashed. On the reason why 0.17.5 appears to work and 0.17.6 no, I suppose that the problem might be triggered by some improvements on Network added by LZ devs to LZ/next that we pulled. To be sure that the problem lies there, you should try to run SAI with --cpu-only option and see if this stops crashes.

cryptsport commented 4 years ago

sai-0.17.6-cpu doesn't work either (the same way). I noticed that the last attempts didn't even have one move. I'll try to find out why there were sometimes several moves before. I run many different engines in Sabaki, this was not the case with others (katago, lz, gtp4zen, amigo...) EDIT maybe I was able to figure out something. with a smaller net, 9b, while it works (more than 50 moves, the game continues). but the sai network is not as big as that of katago, 40x384, which works for me. EDIT 2 sai-0.17.6-gpu (--cpu-only) doesn't work 12b network, work 9b network. 12b - a8e32bb8, 9b - c5de38e8 EDIT 3 nvidea drivers for this video card, first network 12b 88b43a77 also doesn't work (sai-0.17.6)

cryptsport commented 4 years ago

I added "-t 2" and sai-0.17.6-gpu, network 12b works now. interesting to see your comment. EDIT up to "-t 5" works, "-t 6" doesn't work network 12b EDIT 2 up to "-t 3" works, "-t 4" doesn't work network 20b c215fd3b (AMD Athlon X4 950) 0.17.5 - everything is working

Vandertic commented 4 years ago

Wow. This is interesting and I don't think I have ever seen this problem anywhere else. I still believe the problem has to do with latest LZ commits which we included. Unfortunately there is no release for this version of LZ, so to check if the problem is there one would need to compile it under Windows and try. Sorry, but I really don't understand what's happening with your configuration. I'll ask @amato-gianluca if he has any ideas...

cryptsport commented 4 years ago

I disabled CPU virtualization, it gave nothing. and today, at "-t 3", the stop at move 74. there is little information in the sabaki log file. can I get a more detailed log somehow? I am interested in your project. if any tests are needed, I'm ready!

cryptsport commented 4 years ago

I closed all other applications and sai-0.17.6-gpu, network 20b work with "t-5". but without parameter "t-" it still doesn't work. in general, nothing is clear :)

cryptsport commented 4 years ago

with the parameter "-t 3" sai-0.17.6-gpu is reasonably stable (~ 50%). does this parameter affect strength or playstyle? with the same visits? sec per move?

Vandertic commented 4 years ago

It is an optimization parameter. To get the most "nodes" per second (and hence the less seconds per move) you have to find the optimal value, which will be generally depend on your hardware configuration. Actually, changing this number will also change a bit the playing style, as the tree exploration will change. The difference should be small though. BTW, can I ask you again your hardware and software setup? I didn't understand it well from what you wrote above.

cryptsport commented 4 years ago

AMD Athlon X4 950, GF GT610, 8GB RAM, Windows 7. ok?

cryptsport commented 4 years ago

you previously wrote "info move does not appear means that the search has crashed." in the sabaki log file I found this:

[2020-09-08 15:03:05.161] sai (in) : play B D8 [2020-09-08 15:03:05.196] sai (out) : = [2020-09-08 15:03:05.196] sai (out) : [2020-09-08 15:03:05.217] sai (in) : lz-genmove_analyze W 50 [2020-09-08 15:03:05.246] sai (out) : = [2020-09-08 15:03:05.250] sai (err) : Thinking at most 5.0 seconds... [2020-09-08 15:03:05.487] sai (err) : NN eval=0.455956. Agent eval=0.463221 (lambda=0.300, mu=0.030) [2020-09-08 15:03:05.488] sai (err) : cpus=3 [2020-09-08 15:03:05.734] sai (out) : [2020-09-08 15:03:07.739] sai (err) : Playouts: 18, Win: 46.43%, PV: C8 D9 D10 F10 B12 B13 D7 [2020-09-08 15:03:10.393] sai (err) : [2020-09-08 15:03:10.409] sai (err) : C8 - 20 (V: 45.63%) (LCB: 39.48%) (N: 46.87%) (A: 1.8) (B: 0.12) PV: C8 D9 D10 F10 B12 B13 D7 F8 F7 [2020-09-08 15:03:10.409] sai (err) : B9 - 11 (V: 43.05%) (LCB: 29.87%) (N: 13.27%) (A: 2.9) (B: 0.11) PV: B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 [2020-09-08 15:03:10.410] sai (err) : D7 - 7 (V: 38.72%) (LCB: 3.70%) (N: 17.57%) (A: 4.8) (B: 0.11) PV: D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:10.410] sai (err) : [2020-09-08 15:03:10.410] sai (err) : Root - 40 (V: 44.38%) (LCB: 35.81%) (N: 6.04%) (A: 2.6) (B: 0.11) [2020-09-08 15:03:10.411] sai (err) : [2020-09-08 15:03:10.411] sai (err) : 5.8 average depth, 12 max depth [2020-09-08 15:03:10.412] sai (err) : 33 non leaf nodes, 1.15 average children [2020-09-08 15:03:10.412] sai (err) : 40 visits, 11510 nodes, 38 playouts, 7 n/s [2020-09-08 15:03:10.413] sai (err) : [2020-09-08 15:03:19.758] sai (in) : undo [2020-09-08 15:03:19.807] sai (out) : = [2020-09-08 15:03:19.808] sai (out) : [2020-09-08 15:03:19.846] sai (in) : lz-genmove_analyze W 50 [2020-09-08 15:03:19.899] sai (out) : = [2020-09-08 15:03:19.900] sai (err) : Thinking at most 5.0 seconds... [2020-09-08 15:03:19.901] sai (err) : NN eval=0.455956. Agent eval=0.463221 (lambda=0.300, mu=0.030) [2020-09-08 15:03:19.902] sai (err) : cpus=3 [2020-09-08 15:03:20.365] sai (out) : info move C8 visits 23 winrate 4592 prior 4687 lcb 4055 areas 17311 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 12 winrate 4355 prior 1327 lcb 3166 areas 26834 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:20.869] sai (out) : info move C8 visits 26 winrate 4606 prior 4687 lcb 4141 areas 16656 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 13 winrate 4372 prior 1327 lcb 3323 areas 25939 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:21.387] sai (out) : info move C8 visits 29 winrate 4633 prior 4687 lcb 4209 areas 15565 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 14 winrate 4371 prior 1327 lcb 3437 areas 25814 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:21.891] sai (out) : info move C8 visits 32 winrate 4691 prior 4687 lcb 4252 areas 13231 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 15 winrate 4418 prior 1327 lcb 3527 areas 23619 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:22.367] sai (err) : Playouts: 20, Win: 45.18%, PV: C8 D9 D10 F10 B12 B13 D7 F8 F7 [2020-09-08 15:03:22.400] sai (out) : info move C8 visits 35 winrate 4690 prior 4687 lcb 4270 areas 13286 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:22.915] sai (out) : info move C8 visits 38 winrate 4686 prior 4687 lcb 4300 areas 13481 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 8 winrate 4095 prior 1757 lcb 689 areas 37313 order 2 pv D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:23.419] sai (out) : info move C8 visits 41 winrate 4701 prior 4687 lcb 4313 areas 12961 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 9 winrate 4194 prior 1757 lcb 1404 areas 32905 order 2 pv D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:23.935] sai (out) : info move C8 visits 44 winrate 4762 prior 4687 lcb 4362 areas 10532 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 10 winrate 4270 prior 1757 lcb 1918 areas 29615 order 2 pv D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:24.438] sai (out) : info move C8 visits 47 winrate 4753 prior 4687 lcb 4371 areas 10986 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 11 winrate 4212 prior 1757 lcb 2189 areas 32380 order 2 pv D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:25.112] sai (out) : play C8 [2020-09-08 15:03:25.113] sai (out) : [2020-09-08 15:03:25.114] sai (err) : [2020-09-08 15:03:25.115] sai (err) : C8 - 49 (V: 47.33%) (LCB: 43.60%) (N: 46.87%) (A: 1.2) (B: 0.12) PV: C8 D9 D10 F10 B12 B13 D7 F8 F7 [2020-09-08 15:03:25.116] sai (err) : B9 - 16 (V: 43.45%) (LCB: 34.19%) (N: 13.27%) (A: 2.7) (B: 0.11) PV: B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 [2020-09-08 15:03:25.116] sai (err) : D7 - 13 (V: 41.91%) (LCB: 26.60%) (N: 17.57%) (A: 3.3) (B: 0.11) PV: D7 E8 C8 E7 D9 E9 B12 [2020-09-08 15:03:25.116] sai (err) : [2020-09-08 15:03:25.117] sai (err) : Root - 80 (V: 45.59%) (LCB: 41.00%) (N: 6.04%) (A: 1.8) (B: 0.11) [2020-09-08 15:03:25.117] sai (err) : [2020-09-08 15:03:25.117] sai (err) : 5.9 average depth, 14 max depth [2020-09-08 15:03:25.117] sai (err) : 61 non leaf nodes, 1.28 average children [2020-09-08 15:03:25.118] sai (err) : 80 visits, 23314 nodes, 40 playouts, 8 n/s [2020-09-08 15:03:25.118] sai (err) : [2020-09-08 15:03:25.151] leelaz (in) : play W C8

here: (I clicked "start engine vs engine game", the game resumed) [2020-09-08 15:03:19.758] (in) : undo does this not mean that the move was found, but perhaps not passed to sabaki?

cryptsport commented 4 years ago

it's still very interesting to find the cause of the failure! I ran it in SmartGo, and so far everything is fine! (sai-0.17.6-gpu, --gtp --noponder -w networkfile.gz, network 20b 7fa70321, game over, 242 moves) I want to check, make a match of several dozen games EDIT match of 10 games - everything is fine (SmartGo - SAI). and plays with KataGo without any problems

cryptsport commented 4 years ago

SmartGo does not have a separate line like Sabaki for time_settings. (eg time_settings 0 6 1) how to do it?

Vandertic commented 4 years ago

I am at a loss. Glad to hear that at least with smartgo SAI seems to work fine. Will think about this further.

cryptsport commented 4 years ago

works with drago too! works with sabaki 0.33.4. "incompatibility" probably arose with the latest versions of sabaki. in about an hour I will search starting from which version of sabaki the crash occurs

cryptsport commented 4 years ago

works with sabaki 0.33.4, 0.35.1, 0.40.0, 0.40.1. Further 0.41.0, 0.43.3 doesn't work. does it tell you anything? is it possible to fix it? or is it a sabaki problem? EDIT I tried it again in q5go, it does not work. works with "-t 3"

cryptsport commented 4 years ago

I didn't expect this, but it works on the old version q5go-1.1-win! maybe there is a simple explanation for this?

cryptsport commented 3 years ago

even "-t 1" does not help in version 0.18.1. stops after a few moves, often does not even make 1 move. with sai network and lz network

Vandertic commented 3 years ago

Try version 0.18.2. We reverted from a shared mutex update that might make the problem worse. Also, have you tried to use --cpu-only?

cryptsport commented 3 years ago

I tried cpu and gpu. now I will check sai-0.18.2-gpu. but it works with drago!

cryptsport commented 3 years ago

I was not expecting!!! sai-0.18.2-gpu works! I will check again

cryptsport commented 3 years ago

worked with sai network without stopping until the end of the game. for some reason the problem with the lz network remains EDIT again launched with sai network - it works!

cryptsport commented 3 years ago

sai-0.18.2-cpu does not work with the networks sai and lz

cryptsport commented 3 years ago

again launched sai-0.18.2-gpu with lz network - now it works. sai-0.18.2-gpu with lz network doesn't work now. probably depends on the mood