glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
760 stars 298 forks source link

Error in OpenCL Calculation #215

Open anhhuyalex opened 6 years ago

anhhuyalex commented 6 years ago

Hi,

I'm a chess enthusiast/novice in programming trying to help with creating self-play games on my Mac (consumer CPU). I built lczero successfully and attempted to run ./train.sh

I got the following error and am wondering if I can receive help here: update_root, 21611 -> 2888 expanded nodes (13.4% reused) Error in OpenCL calculation: expected 0.549273 got 2.094689 (501(error=281.356859%) Error in OpenCL calculation: expected -0.480243 got 0.000918 (1(error=47924.261475%) libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: OpenCL self-check mismatch. ./train.sh: line 13: 46400 Abort trap: 6 ./lczero --weights=weights.txt --randomize -n -t1 --start="train 1" > training.out ./train.sh: line 13: 46401 Abort trap: 6 ./lczero --weights=weights.txt --randomize -n -t1 --start="train 2" > training2.out

Here's my information, gleaned from ./train.sh Als-MacBook-Pro:build alng$ ./train.sh Using 1 thread(s). Generated 1924 moves Detecting residual layers...v1...Using 1 thread(s). Generated 1924 moves Detecting residual layers...v1...64 channels...64 channels...6 blocks. 6 blocks. Initializing OpenCL. Detected 1 OpenCL platforms. Platform version: OpenCL 1.2 (Nov 18 2015 20:45:47) Platform profile: FULL_PROFILE Platform name: Apple Platform vendor: Apple Device ID: 0 Device name: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz Device type: CPU Device vendor: Intel Device driver: 1.1 Device speed: 2700 MHz Device cores: 4 CU Device score: 512 Device ID: 1 Device name: Intel(R) Iris(TM) Graphics 6100 Device type: GPU Device vendor: Intel Inc. Device driver: 1.2(Nov 18 2015 20:57:39) Device speed: 1050 MHz Device cores: 48 CU Device score: 612 Selected platform: Apple Selected device: Intel(R) Iris(TM) Graphics 6100 with OpenCL 1.2 capability. Loaded existing SGEMM tuning. Wavefront/Warp size: 8 Max workgroup size: 256 Max workgroup dimensions: 256 256 256 Initializing OpenCL. Detected 1 OpenCL platforms. Platform version: OpenCL 1.2 (Nov 18 2015 20:45:47) Platform profile: FULL_PROFILE Platform name: Apple Platform vendor: Apple Device ID: 0 Device name: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz Device type: CPU Device vendor: Intel Device driver: 1.1 Device speed: 2700 MHz Device cores: 4 CU Device score: 512 Device ID: 1 Device name: Intel(R) Iris(TM) Graphics 6100 Device type: GPU Device vendor: Intel Inc. Device driver: 1.2(Nov 18 2015 20:57:39) Device speed: 1050 MHz Device cores: 48 CU Device score: 612 Selected platform: Apple Selected device: Intel(R) Iris(TM) Graphics 6100 with OpenCL 1.2 capability. Loaded existing SGEMM tuning. Found 0 existing chunks in data-1/training update_root, 0 -> 0 expanded nodes (0.0% reused) NNCache: 0/0 hits/lookups = 0.0% hitrate, 0 inserts, 0 size Wavefront/Warp size: 8 Max workgroup size: 256 Max workgroup dimensions: 256 256 256 RNG seed: 0xca406202630005be (thread: 4302086495072090873) Found 0 existing chunks in data-2/training update_root, 0 -> 0 expanded nodes (0.0% reused) NNCache: 0/0 hits/lookups = 0.0% hitrate, 0 inserts, 0 size RNG seed: 0x3c27f62aeda4c159 (thread: 4302086495072090873)

I'm not sure what other useful info I can provide. Please let me know. Thank you.

jjoshua2 commented 6 years ago

I think Mac opencl driver is bugged but you could use the cpu only build and that should work faster than an integrated igpu anyway

On Fri, Mar 30, 2018, 1:47 PM Alex Nguyen notifications@github.com wrote:

Hi,

I'm a chess enthusiast/novice in programming trying to help with creating self-play games on my Mac (consumer CPU). I built lczero successfully and attempted to run ./train.sh

I got the following error and am wondering if I can receive help here: update_root, 21611 -> 2888 expanded nodes (13.4% reused) Error in OpenCL calculation: expected 0.549273 got 2.094689 (501(error=281.356859%) Error in OpenCL calculation: expected -0.480243 got 0.000918 (1(error=47924.261475%) libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: OpenCL self-check mismatch. ./train.sh: line 13: 46400 Abort trap: 6 ./lczero --weights=weights.txt --randomize -n -t1 --start="train 1" > training.out ./train.sh: line 13: 46401 Abort trap: 6 ./lczero --weights=weights.txt --randomize -n -t1 --start="train 2" > training2.out

Here's my information, gleaned from ./train.sh Als-MacBook-Pro:build alng$ ./train.sh Using 1 thread(s). Generated 1924 moves Detecting residual layers...v1...Using 1 thread(s). Generated 1924 moves Detecting residual layers...v1...64 channels...64 channels...6 blocks. 6 blocks. Initializing OpenCL. Detected 1 OpenCL platforms. Platform version: OpenCL 1.2 (Nov 18 2015 20:45:47) Platform profile: FULL_PROFILE Platform name: Apple Platform vendor: Apple Device ID: 0 Device name: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz Device type: CPU Device vendor: Intel Device driver: 1.1 Device speed: 2700 MHz Device cores: 4 CU Device score: 512 Device ID: 1 Device name: Intel(R) Iris(TM) Graphics 6100 Device type: GPU Device vendor: Intel Inc. Device driver: 1.2(Nov 18 2015 20:57:39) Device speed: 1050 MHz Device cores: 48 CU Device score: 612 Selected platform: Apple Selected device: Intel(R) Iris(TM) Graphics 6100 with OpenCL 1.2 capability. Loaded existing SGEMM tuning. Wavefront/Warp size: 8 Max workgroup size: 256 Max workgroup dimensions: 256 256 256 Initializing OpenCL. Detected 1 OpenCL platforms. Platform version: OpenCL 1.2 (Nov 18 2015 20:45:47) Platform profile: FULL_PROFILE Platform name: Apple Platform vendor: Apple Device ID: 0 Device name: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz Device type: CPU Device vendor: Intel Device driver: 1.1 Device speed: 2700 MHz Device cores: 4 CU Device score: 512 Device ID: 1 Device name: Intel(R) Iris(TM) Graphics 6100 Device type: GPU Device vendor: Intel Inc. Device driver: 1.2(Nov 18 2015 20:57:39) Device speed: 1050 MHz Device cores: 48 CU Device score: 612 Selected platform: Apple Selected device: Intel(R) Iris(TM) Graphics 6100 with OpenCL 1.2 capability. Loaded existing SGEMM tuning. Found 0 existing chunks in data-1/training update_root, 0 -> 0 expanded nodes (0.0% reused) NNCache: 0/0 hits/lookups = 0.0% hitrate, 0 inserts, 0 size Wavefront/Warp size: 8 Max workgroup size: 256 Max workgroup dimensions: 256 256 256 RNG seed: 0xca406202630005be (thread: 4302086495072090873) Found 0 existing chunks in data-2/training update_root, 0 -> 0 expanded nodes (0.0% reused) NNCache: 0/0 hits/lookups = 0.0% hitrate, 0 inserts, 0 size RNG seed: 0x3c27f62aeda4c159 (thread: 4302086495072090873)

I'm not sure what other useful info I can provide. Please let me know. Thank you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/215, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INPuU24_LyM1rjM0QCjjFLFADOooRks5tjm-0gaJpZM4TB9DD .

anhhuyalex commented 6 years ago

Could you elaborate on how I could use the CPU only build or point me to resources where I can figure that out? I'm a bit unsure that you mean. Thanks.

jkiliani commented 6 years ago

I have a very similar configuration, and never been able to use the GPU. Mac OpenCL drivers are notoriously buggy. Just compile for CPU, and everything will be fine.

killerducky commented 6 years ago

@anhhuyalex You do not need to run train.sh. You need to compile the client in go/src/client and run that. See https://github.com/glinscott/leela-chess/blob/master/go/src/client/README.md for this step.

anhhuyalex commented 6 years ago

When I perform those steps I get the following message

Downloading network... Args: [/Users/alng/Documents/Datascience/leela-chess/go/src/client/lczero --weights=networks/860fa9b753c2b560c731bc71c9d6148e9d0926fe7b0f14f52d18405486759a7e -t1 --randomize -n --start=train 49366-0 1 --quiet] Created dirs data-49366-0 info depth 16 nodes 800 nps 103 score cp 0 winrate 50.06% time 7740 pv a2a3 a7a6 a3a4 a6a5 a1a2 a8a7 a2a3 a7a6 a3a2 a6a7 a2a3 a7a6 a3a2 a6a7 info depth 16 nodes 800 nps 143 score cp 0 winrate 49.85% time 1804 pv a7a6 a3a4 a6a5 a1a2 a8a7 a2a3 a7a6 a3a2 a6a7 a2a3 a7a6 a3a2 a6a7 Update your GPU drivers or reduce the amount of games played simultaneously. libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: OpenCL self-check mismatch. 2018/03/31 01:22:04 signal: abort trap

killerducky commented 6 years ago

Try running client -debug. It will create a logs-#### directory, and put logfiles in there. Upload one of the logfiles here or pastebin.

jkiliani commented 6 years ago

For CPU only on Mac, you have to modify config.h before compiling lczero. Comment out (or delete, doesn't matter) the two lines

define USE_OPENCL

define USE_OPENCL_SELFCHECK

@killerducky I guess now with FEATURE_USE_CPU_ONLY there's a better way, but how do I trigger than when calling make? Command line option?

anhhuyalex commented 6 years ago

Here's the pastebin of the logfiles https://pastebin.com/dLesjePJ

killerducky commented 6 years ago

Thanks for the logfile. Unfortunately the debug logs are not done correctly, so I can't see any extra information. I'll update the debug logging later.

In the meantime compile for CPU only as jkiliani says above.

anhhuyalex commented 6 years ago

I tried CPU compile and everything's running smoothly so far! Thanks very much.

IanOsgood commented 6 years ago

I think this was an OpenCL bug for that particular video card (Intel Iris Graphics 6100), not Macs in general. I am on the same generation 15" MacBookPro, running the GPU build just fine on an AMD Radeon R9 M370X for about a 3x speedup compared to using FEATURE_USE_CPU_ONLY.

I think we should change the Mac setup instructions to use the standard Leela compile, only modifying config.h and recompiling if they run into an OpenCL error like this one.