featurecat / lizzie

Lizzie - Leela Zero Interface
GNU General Public License v3.0
970 stars 228 forks source link

Lizzie 0.6 doesn't analyze with 2x RTX 2080 Ti GPUs = Bug #530

Open lex312 opened 5 years ago

lex312 commented 5 years ago

I'm using Leela Zero 0.17 + AutoGTP v18. With one gpu it works fine.

But with two gpus it doesn't work. Lizzie 0.6 is open but: Leela Zero is loading...!!! I can use the x button and do moves on the board.

{ "leelaz": { "max-analyze-time-minutes": 60, "analyze-update-interval-centisec": 10, "network-file": "network.gz", "max-game-thinking-time-seconds": 2, "engine-start-location": ".", "engine-command": "./leela-zero/leelaz --gtp --lagbuffer 0 --weights %network-file --gpu 0 --gpu 1", "print-comms": false }, "ui": { "comment-font-size": 0, "board-color": [ 217, 152, 77 ], "shadow-size": 100, "show-winrate": true, "autosave-interval-seconds": -1, "append-winrate-to-comment": true, "fancy-board": true, "show-captured": true, "weighted-blunder-bar-height": false, "--gpu 0 --gpu 1 --gpu 2 --gpu 3": true, "win-rate-always-black": false, "show-move-number": true, "winrate-stroke-width": 3, "show-next-moves": true, "show-comment": true, "show-leelaz-variation": true, "theme": "default", "min-playout-ratio-for-stats": 0, "fancy-stones": true, "resume-previous-game": false, "window-size": [ 3840, 2160 ], "new-move-number-in-branch": true, "shadows-enabled": true, "show-variation-graph": true, "show-dynamic-komi": true, "minimum-blunder-bar-width": 3, "large-winrate": false, "show-blunder-bar": true, "only-last-move-number": 1, "confirm-exit": false, "show-status": true, "handicap-instead-of-winrate": false, "large-subboard": false, "dynamic-winrate-graph-width": true, "show-subboard": true, "window-maximized": true, "show-best-moves": true, "board-size": 19 } }

featurecat commented 5 years ago

What happens when you run that Leelaz command from a command line?

On Wed, May 15, 2019, 7:16 PM superbnet notifications@github.com wrote:

I'm using Leela Zero 0.17 + AutoGTP v18. With one gpu it works fine.

But with two gpus it doesn't work. Lizzie 0.6 is open but: Leela Zero is loading...!!! I can use the x button and do moves on the board.

{ "leelaz": { "max-analyze-time-minutes": 60, "analyze-update-interval-centisec": 10, "network-file": "network.gz", "max-game-thinking-time-seconds": 2, "engine-start-location": ".", "engine-command": "./leela-zero/leelaz --gtp --lagbuffer 0 --weights %network-file --gpu 0 --gpu 1", "print-comms": false }, "ui": { "comment-font-size": 0, "board-color": [ 217, 152, 77 ], "shadow-size": 100, "show-winrate": true, "autosave-interval-seconds": -1, "append-winrate-to-comment": true, "fancy-board": true, "show-captured": true, "weighted-blunder-bar-height": false, "--gpu 0 --gpu 1 --gpu 2 --gpu 3": true, "win-rate-always-black": false, "show-move-number": true, "winrate-stroke-width": 3, "show-next-moves": true, "show-comment": true, "show-leelaz-variation": true, "theme": "default", "min-playout-ratio-for-stats": 0, "fancy-stones": true, "resume-previous-game": false, "window-size": [ 3840, 2160 ], "new-move-number-in-branch": true, "shadows-enabled": true, "show-variation-graph": true, "show-dynamic-komi": true, "minimum-blunder-bar-width": 3, "large-winrate": false, "show-blunder-bar": true, "only-last-move-number": 1, "confirm-exit": false, "show-status": true, "handicap-instead-of-winrate": false, "large-subboard": false, "dynamic-winrate-graph-width": true, "show-subboard": true, "window-maximized": true, "show-best-moves": true, "board-size": 19 } }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/featurecat/lizzie/issues/530?email_source=notifications&email_token=ACQHLMX524T6U2O7IDC3GVDPVSKVNA5CNFSM4HNHWEOKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUBLW3A, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQHLMTWJDSKVQYUSXZY6ZDPVSKVNANCNFSM4HNHWEOA .

lex312 commented 5 years ago

I don't know if this is correct: Z:\LG0\Lizzie\leela-zero\leelaz.exe

Little black window opened for a half second.

featurecat commented 5 years ago

this one:

Z:\LG0\Lizzie\leela-zero\leelaz.exe--gtp --lagbuffer 0 --weights %network-file --gpu 0 --gpu 1

On Thu, May 16, 2019, 1:46 PM superbnet notifications@github.com wrote:

I don't know if this is correct: Z:\LG0\Lizzie\leela-zero\leelaz.exe

Little black window opened for a half second.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/featurecat/lizzie/issues/530?email_source=notifications&email_token=ACQHLMQQEYCZQ5UQNR6KFR3PVWMWDA5CNFSM4HNHWEOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVSR4WI#issuecomment-493166169, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQHLMWRC6R6OUBPBRW5EELPVWMWDANCNFSM4HNHWEOA .

lex312 commented 5 years ago

Z:\LG0\Lizzie\leela-zero\leelaz.exe --gtp --lagbuffer 0 --weights %network-file --gpu 0 --gpu 1

Little black window opened for a half second.

featurecat commented 5 years ago

oh oops you need to specify a weights file. I can get back to you with a solution tomorrow.

On Thu, May 16, 2019, 2:16 PM superbnet notifications@github.com wrote:

Z:\LG0\Lizzie\leela-zero\leelaz.exe --gtp --lagbuffer 0 --weights %network-file --gpu 0 --gpu 1

Little black window opened for a half second.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/featurecat/lizzie/issues/530?email_source=notifications&email_token=ACQHLMQNTOLAAIFI2QH7EJDPVWQJJA5CNFSM4HNHWEOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVSUQTA#issuecomment-493176908, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQHLMTC77AHMLV5ND5TBHLPVWQJJANCNFSM4HNHWEOA .

lex312 commented 5 years ago

I thing you asked for this right:

Using OpenCL batch size of 5 Using 20 thread(s). RNG seed: 9659931005586854004 Using per-move time margin of 0.00s. BLAS Core: Sandybridge Detecting residual layers...v1...256 channels...40 blocks. Initializing OpenCL (autodetecting precision). Detected 2 OpenCL platforms. Platform version: OpenCL 2.0 AMD-APP (2079.4) Platform profile: FULL_PROFILE Platform name: AMD Accelerated Parallel Processing Platform vendor: Advanced Micro Devices, Inc. Device ID: 0 Device name: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz Device type: CPU Device vendor: GenuineIntel Device driver: 2079.4 (sse2,avx) Device speed: 3200 MHz Device cores: 6 CU Device score: 520 Platform version: OpenCL 1.2 CUDA 10.0.150 Platform profile: FULL_PROFILE Platform name: NVIDIA CUDA Platform vendor: NVIDIA Corporation Device ID: 1 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Device ID: 2 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Selected platform: AMD Accelerated Parallel Processing Selected device: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz with OpenCL 2.0 capability. Half precision compute support: No. Tensor Core support: No. Selected platform: NVIDIA CUDA Selected device: GeForce RTX 2080 Ti with OpenCL 1.2 capability. Half precision compute support: No. Tensor Core support: Yes. Detected 2 OpenCL platforms. Platform version: OpenCL 2.0 AMD-APP (2079.4) Platform profile: FULL_PROFILE Platform name: AMD Accelerated Parallel Processing Platform vendor: Advanced Micro Devices, Inc. Device ID: 0 Device name: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz Device type: CPU Device vendor: GenuineIntel Device driver: 2079.4 (sse2,avx) Device speed: 3200 MHz Device cores: 6 CU Device score: 520 Platform version: OpenCL 1.2 CUDA 10.0.150 Platform profile: FULL_PROFILE Platform name: NVIDIA CUDA Platform vendor: NVIDIA Corporation Device ID: 1 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Device ID: 2 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Selected platform: AMD Accelerated Parallel Processing Selected device: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz with OpenCL 2.0 capability. Half precision compute support: No. Tensor Core support: No. Selected platform: NVIDIA CUDA Selected device: GeForce RTX 2080 Ti with OpenCL 1.2 capability. Half precision compute support: No. Tensor Core support: Yes.

Started OpenCL SGEMM tuner. Will try 290 valid configurations. (1/290) KWG=32 KWI=2 MDIMA=32 MDIMC=32 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 S TRM=0 STRN=0 TCE=0 VWM=2 VWN=4 80.8373 ms (7.3 GFLOPS) (3/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STR M=0 STRN=0 TCE=0 VWM=2 VWN=2 56.7485 ms (10.4 GFLOPS) (15/290) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 46.0721 ms (12.8 GFLOPS) (37/290) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 ST RM=0 STRN=0 TCE=0 VWM=2 VWN=4 45.3389 ms (13.0 GFLOPS) (97/290) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 ST RM=0 STRN=0 TCE=0 VWM=2 VWN=2 44.9333 ms (13.1 GFLOPS) (168/290) KWG=32 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 S TRM=0 STRN=0 TCE=0 VWM=2 VWN=2 43.7751 ms (13.5 GFLOPS) Wavefront/Warp size: 1 Max workgroup size: 1024 Max workgroup dimensions: 1024 1024 1024

Started OpenCL SGEMM tuner. Will try 290 valid configurations. (1/290) KWG=32 KWI=2 MDIMA=32 MDIMC=32 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 S TRM=0 STRN=0 TCE=0 VWM=2 VWN=4 0.1471 ms (4008.9 GFLOPS) (6/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STR M=0 STRN=0 TCE=0 VWM=2 VWN=4 0.1422 ms (4146.7 GFLOPS) (8/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB=1 S TRM=0 STRN=0 TCE=0 VWM=2 VWN=4 0.1258 ms (4689.5 GFLOPS) (10/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB= 1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.1218 ms (4841.3 GFLOPS) (18/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.1106 ms (5332.2 GFLOPS) (22/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB= 1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.1092 ms (5402.9 GFLOPS) (26/290) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 ST RM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.1069 ms (5518.1 GFLOPS) (29/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=2 VWN=4 0.1063 ms (5551.0 GFLOPS) (31/290) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 ST RM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.0981 ms (6013.2 GFLOPS) (34/290) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0947 ms (6230.2 GFLOPS) (39/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB= 1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0879 ms (6708.6 GFLOPS) (46/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.0799 ms (7379.4 GFLOPS) (108/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.0765 ms (7708.9 GFLOPS) (178/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0760 ms (7756.8 GFLOPS) (227/290) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0752 ms (7843.4 GFLOPS) (240/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0725 ms (8130.6 GFLOPS) (283/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0722 ms (8172.0 GFLOPS) Wavefront/Warp size: 32 Max workgroup size: 1024 Max workgroup dimensions: 1024 1024 64

Started OpenCL SGEMM tuner. Will try 290 valid configurations. Failed to compile: 290 kernels. Failed to find a working configuration. Check your OpenCL drivers. Minimum error: 100.000000. Error bound: 0.100000 Using OpenCL single precision (half precision failed to run). Detected 2 OpenCL platforms. Platform version: OpenCL 2.0 AMD-APP (2079.4) Platform profile: FULL_PROFILE Platform name: AMD Accelerated Parallel Processing Platform vendor: Advanced Micro Devices, Inc. Device ID: 0 Device name: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz Device type: CPU Device vendor: GenuineIntel Device driver: 2079.4 (sse2,avx) Device speed: 3200 MHz Device cores: 6 CU Device score: 520 Platform version: OpenCL 1.2 CUDA 10.0.150 Platform profile: FULL_PROFILE Platform name: NVIDIA CUDA Platform vendor: NVIDIA Corporation Device ID: 1 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Device ID: 2 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Selected platform: AMD Accelerated Parallel Processing Selected device: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz with OpenCL 2.0 capability. Half precision compute support: No. Tensor Core support: No. Selected platform: NVIDIA CUDA Selected device: GeForce RTX 2080 Ti with OpenCL 1.2 capability. Half precision compute support: No. Tensor Core support: Yes. Loaded existing SGEMM tuning. Wavefront/Warp size: 1 Max workgroup size: 1024 Max workgroup dimensions: 1024 1024 1024 Loaded existing SGEMM tuning. Wavefront/Warp size: 32 Max workgroup size: 1024 Max workgroup dimensions: 1024 1024 64 Setting max tree size to 3660 MiB and cache size to 406 MiB.

featurecat commented 5 years ago

Yes that's right. What was the exact command you used?

On Thu, May 16, 2019, 4:28 PM superbnet notifications@github.com wrote:

I thing you asked for this right:

Using OpenCL batch size of 5 Using 20 thread(s). RNG seed: 9659931005586854004 Using per-move time margin of 0.00s. BLAS Core: Sandybridge Detecting residual layers...v1...256 channels...40 blocks. Initializing OpenCL (autodetecting precision). Detected 2 OpenCL platforms. Platform version: OpenCL 2.0 AMD-APP (2079.4) Platform profile: FULL_PROFILE Platform name: AMD Accelerated Parallel Processing Platform vendor: Advanced Micro Devices, Inc. Device ID: 0 Device name: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz Device type: CPU Device vendor: GenuineIntel Device driver: 2079.4 (sse2,avx) Device speed: 3200 MHz Device cores: 6 CU Device score: 520 Platform version: OpenCL 1.2 CUDA 10.0.150 Platform profile: FULL_PROFILE Platform name: NVIDIA CUDA Platform vendor: NVIDIA Corporation Device ID: 1 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Device ID: 2 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Selected platform: AMD Accelerated Parallel Processing Selected device: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz with OpenCL 2.0 capability. Half precision compute support: No. Tensor Core support: No. Selected platform: NVIDIA CUDA Selected device: GeForce RTX 2080 Ti with OpenCL 1.2 capability. Half precision compute support: No. Tensor Core support: Yes. Detected 2 OpenCL platforms. Platform version: OpenCL 2.0 AMD-APP (2079.4) Platform profile: FULL_PROFILE Platform name: AMD Accelerated Parallel Processing Platform vendor: Advanced Micro Devices, Inc. Device ID: 0 Device name: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz Device type: CPU Device vendor: GenuineIntel Device driver: 2079.4 (sse2,avx) Device speed: 3200 MHz Device cores: 6 CU Device score: 520 Platform version: OpenCL 1.2 CUDA 10.0.150 Platform profile: FULL_PROFILE Platform name: NVIDIA CUDA Platform vendor: NVIDIA Corporation Device ID: 1 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Device ID: 2 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Selected platform: AMD Accelerated Parallel Processing Selected device: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz with OpenCL 2.0 capability. Half precision compute support: No. Tensor Core support: No. Selected platform: NVIDIA CUDA Selected device: GeForce RTX 2080 Ti with OpenCL 1.2 capability. Half precision compute support: No. Tensor Core support: Yes.

Started OpenCL SGEMM tuner. Will try 290 valid configurations. (1/290) KWG=32 KWI=2 MDIMA=32 MDIMC=32 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 S TRM=0 STRN=0 TCE=0 VWM=2 VWN=4 80.8373 ms (7.3 GFLOPS) (3/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STR M=0 STRN=0 TCE=0 VWM=2 VWN=2 56.7485 ms (10.4 GFLOPS) (15/290) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 46.0721 ms (12.8 GFLOPS) (37/290) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 ST RM=0 STRN=0 TCE=0 VWM=2 VWN=4 45.3389 ms (13.0 GFLOPS) (97/290) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 ST RM=0 STRN=0 TCE=0 VWM=2 VWN=2 44.9333 ms (13.1 GFLOPS) (168/290) KWG=32 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 S TRM=0 STRN=0 TCE=0 VWM=2 VWN=2 43.7751 ms (13.5 GFLOPS) Wavefront/Warp size: 1 Max workgroup size: 1024 Max workgroup dimensions: 1024 1024 1024

Started OpenCL SGEMM tuner. Will try 290 valid configurations. (1/290) KWG=32 KWI=2 MDIMA=32 MDIMC=32 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 S TRM=0 STRN=0 TCE=0 VWM=2 VWN=4 0.1471 ms (4008.9 GFLOPS) (6/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STR M=0 STRN=0 TCE=0 VWM=2 VWN=4 0.1422 ms (4146.7 GFLOPS) (8/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB=1 S TRM=0 STRN=0 TCE=0 VWM=2 VWN=4 0.1258 ms (4689.5 GFLOPS) (10/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB= 1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.1218 ms (4841.3 GFLOPS) (18/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.1106 ms (5332.2 GFLOPS) (22/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB= 1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.1092 ms (5402.9 GFLOPS) (26/290) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 ST RM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.1069 ms (5518.1 GFLOPS) (29/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=2 VWN=4 0.1063 ms (5551.0 GFLOPS) (31/290) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 ST RM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.0981 ms (6013.2 GFLOPS) (34/290) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0947 ms (6230.2 GFLOPS) (39/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB= 1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0879 ms (6708.6 GFLOPS) (46/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.0799 ms (7379.4 GFLOPS) (108/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.0765 ms (7708.9 GFLOPS) (178/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0760 ms (7756.8 GFLOPS) (227/290) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0752 ms (7843.4 GFLOPS) (240/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0725 ms (8130.6 GFLOPS) (283/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.0722 ms (8172.0 GFLOPS) Wavefront/Warp size: 32 Max workgroup size: 1024 Max workgroup dimensions: 1024 1024 64

Started OpenCL SGEMM tuner. Will try 290 valid configurations. Failed to compile: 290 kernels. Failed to find a working configuration. Check your OpenCL drivers. Minimum error: 100.000000. Error bound: 0.100000 Using OpenCL single precision (half precision failed to run). Detected 2 OpenCL platforms. Platform version: OpenCL 2.0 AMD-APP (2079.4) Platform profile: FULL_PROFILE Platform name: AMD Accelerated Parallel Processing Platform vendor: Advanced Micro Devices, Inc. Device ID: 0 Device name: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz Device type: CPU Device vendor: GenuineIntel Device driver: 2079.4 (sse2,avx) Device speed: 3200 MHz Device cores: 6 CU Device score: 520 Platform version: OpenCL 1.2 CUDA 10.0.150 Platform profile: FULL_PROFILE Platform name: NVIDIA CUDA Platform vendor: NVIDIA Corporation Device ID: 1 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Device ID: 2 Device name: GeForce RTX 2080 Ti Device type: GPU Device vendor: NVIDIA Corporation Device driver: 411.70 Device speed: 1545 MHz Device cores: 68 CU Device score: 1112 Selected platform: AMD Accelerated Parallel Processing Selected device: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz with OpenCL 2.0 capability. Half precision compute support: No. Tensor Core support: No. Selected platform: NVIDIA CUDA Selected device: GeForce RTX 2080 Ti with OpenCL 1.2 capability. Half precision compute support: No. Tensor Core support: Yes. Loaded existing SGEMM tuning. Wavefront/Warp size: 1 Max workgroup size: 1024 Max workgroup dimensions: 1024 1024 1024 Loaded existing SGEMM tuning. Wavefront/Warp size: 32 Max workgroup size: 1024 Max workgroup dimensions: 1024 1024 64 Setting max tree size to 3660 MiB and cache size to 406 MiB.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/featurecat/lizzie/issues/530?email_source=notifications&email_token=ACQHLMSPWPYSJE3SVUPCWHTPVW7W7A5CNFSM4HNHWEOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVS7B5A#issuecomment-493220084, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQHLMUJVSERLIUAPX2BALDPVW7W7ANCNFSM4HNHWEOA .

lex312 commented 5 years ago

{ "leelaz": { "max-analyze-time-minutes": 60, "analyze-update-interval-centisec": 10, "network-file": "network.gz", "max-game-thinking-time-seconds": 2, "engine-start-location": ".", "engine-command": "./leela-zero/leelaz --gtp --lagbuffer 0 --weights %network-file --gpu 1 --gpu 2", "print-comms": false }, "ui": { "comment-font-size": 0, "board-color": [ 217, 152, 77 ], "shadow-size": 100, "show-winrate": true, "autosave-interval-seconds": -1, "append-winrate-to-comment": true, "fancy-board": true, "show-captured": true, "weighted-blunder-bar-height": false, "--gpu 0 --gpu 1 --gpu 2 --gpu 3": true, "win-rate-always-black": false, "show-move-number": true, "winrate-stroke-width": 3, "show-next-moves": true, "show-comment": true, "show-leelaz-variation": true, "theme": "default", "min-playout-ratio-for-stats": 0, "fancy-stones": true, "resume-previous-game": false, "window-size": [ 3840, 2160 ], "new-move-number-in-branch": true, "shadows-enabled": true, "show-variation-graph": true, "show-dynamic-komi": true, "minimum-blunder-bar-width": 3, "large-winrate": false, "show-blunder-bar": true, "only-last-move-number": 1, "confirm-exit": false, "show-status": true, "handicap-instead-of-winrate": false, "large-subboard": false, "dynamic-winrate-graph-width": true, "show-subboard": true, "window-maximized": true, "show-best-moves": true, "board-size": 19 } }

Problem fixed:)

It needs to be gpu 1 and gpu 2. Not gpu 0 and gpu 1.

Because device 1 is gpu 1 and device 2 is gpu 2.

gpu 0 is device 0 and device 0 is the cpu!!!

To use gpu 0 and gpu 1 means to use cpu and a gpu and this does not work.