LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.37k stars 523 forks source link

Integration of CoreML Backend into Leela Chess Zero #1950

Open ChinChangYang opened 6 months ago

ChinChangYang commented 6 months ago

Overview This pull request introduces a CoreML backend for Leela Chess Zero (lc0), capitalizing on Apple's CoreML framework to significantly enhance neural network computations on macOS devices. By integrating this backend, lc0 gains considerable performance optimizations and expanded computational capabilities tailored for Apple hardware.

Implementation Highlights

Prerequisites and Setup Prior to the integration, it is imperative to build coremltools from a specific pull request (https://github.com/apple/coremltools/pull/2087) pending the release of coremltools 7.2. The conversion of networks into CoreML models necessitates the use of net_to_coreml.py from the lczero-training repository, following the instructions delineated in https://github.com/LeelaChessZero/lczero-training/pull/222.

Converting Networks to CoreML Models:

  1. Create a Python environment and install necessary packages:
    conda create -n net-to-coreml-py39 python=3.9
    conda activate net-to-coreml-py39
    pip install numpy tensorflow tensorflow-metal protobuf==3.20.3 pyyaml coremltools
  2. Clone the lczero-training repository and prepare the environment:
    git clone --recurse-submodules https://github.com/LeelaChessZero/lczero-training.git
    cd lczero-training
    git fetch origin pull/222/head:net-to-coreml
    git switch net-to-coreml
    ./init.sh
  3. Download the network and configuration, then convert to CoreML model:
    cd tf
    wget -O 817580.lc0 "https://training.lczero.org/get_network?sha=7658338877bcf498b3329b9c196abfb123659dc16a5db298a2378cc4a5bb25ba"
    wget https://gist.githubusercontent.com/ChinChangYang/948bc4f9114dfb512abff8e3b2392962/raw/7400327b759af7703aaa0052e0e837ebc25e1cc8/768x15x24h-t80.yaml
    python net_to_coreml.py --cfg 768x15x24h-t80.yaml -e 817580.lc0
  4. Transfer the CoreML model and network to lc0's directory:
    cp -r dev1/networks/768x15x24h-t80/817580.lc0.mlpackage /path/to/lc0/build/release/lc0.mlpackage
    cp 817580.lc0 /path/to/lc0/build/release/

    Benchmarking lc0 with CoreML Backend: Navigate to lc0's release directory and execute the benchmark command to evaluate performance:

    % cd /path/to/lc0/build/release/
    % ./lc0 benchmark -b coreml
    [...]
    ===========================
    Total time (ms) : 345668
    Nodes searched  : 95896
    Nodes/second    : 277

Contribution and Review Request This pull request seeks a comprehensive review of the CoreML backend's integration, focusing on its compatibility, performance enhancements, and alignment with lc0's architectural standards. Feedback, suggestions, and further optimizations are highly encouraged to ensure this significant feature's robust integration into lc0, fostering a seamless user experience on macOS platforms.

ChinChangYang commented 6 months ago

Absolute error histogram

% ./lc0 benchmark -b check --backend-opts=mode=histo,coreml --num-positions=1

       _
|   _ | |
|_ |_ |_| v0.31.0-dev+git.dirty built Feb  6 2024
Found pb network file: ./weights_run1_817374.lc0
Creating backend [check]...
Working backend set to coreml.
Reference backend set to eigen.
Creating backend [coreml]...
2024-02-06 21:56:53.101 lc0[73824:20798742] Compiling model: lc0.mlpackage/ -- file:///Users/chinchangyang/Code/lc0-ccy/build/release/
2024-02-06 21:56:53.317 lc0[73824:20798766] Compiled model URL: file:///var/folders/dv/kdr9x4yn4s106_94ydk5jnjc0000gn/T/lc0_D13B185D-B44F-4F09-9DE6-F447BE2294CB.mlmodelc
2024-02-06 21:56:53.317 lc0[73824:20798766] Initializing model with the compiled model URL...
2024-02-06 21:56:59.723 lc0[73824:20798766] Model successfully initialized
Creating backend [eigen]...
Using Eigen version 3.4.0
Eigen max batch size is 256.
Check mode: histogram.
Check rate: 20%.

Position: 1/1 rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Benchmark time 61 ms, 2 nodes, 36 nps, move g1f3
Benchmark time 114 ms, 8 nodes, 74 nps, move d2d4
Benchmark time 203 ms, 9 nodes, 45 nps, move d2d4
Benchmark time 364 ms, 15 nodes, 41 nps, move d2d4
Benchmark time 382 ms, 19 nodes, 50 nps, move g1f3
Benchmark time 413 ms, 22 nodes, 54 nps, move g1f3
Benchmark time 502 ms, 40 nodes, 80 nps, move e2e3
Benchmark time 628 ms, 66 nodes, 106 nps, move g1f3
Benchmark time 652 ms, 102 nodes, 157 nps, move g1f3
Benchmark time 943 ms, 187 nodes, 199 nps, move g1f3
Benchmark time 1002 ms, 206 nodes, 207 nps, move e2e4
Absolute error histogram for a batch of 14
      |                                                                                         |
      |                                                                   #                     |
      |                                                                   ##                    |
 0.15 +                                                                   ##                    +
      |                                                                   ##                    |
      |                                                                   ###                   |
      |                                                                   ###                   |
      |                                                                  ####                   |
  0.1 +                                                                  ####                   +
      |                                                                 #####                   |
      |                                                                 #####                   |
      |                                                                 #######                 |
      |                                                                 #######                 |
 0.05 +                                                                 ####### ##              +
      |                                                                 ##########              |
      |                                                               # ############            |
      |                                                               ##############            |
      |                                                           ####################          |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 
Absolute error histogram for a batch of 32
      |                                                                                         |
 0.15 +                                                                   ##                    +
      |                                                                   ##                    |
      |                                                                   ##                    |
      |                                                                  ####                   |
      |                                                                  ####                   |
  0.1 +                                                                  ####                   +
      |                                                                 ######                  |
      |                                                                 ######                  |
      |                                                                 #######                 |
      |                                                                 #######                 |
 0.05 +                                                                ########                 +
      |                                                                #########                |
      |                                                                ########## ###           |
      |                                                              ################           |
      |                                                       #########################         |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 
Benchmark time 5088 ms, 262 nodes, 51 nps, move g1f3
Benchmark time 5315 ms, 345 nodes, 64 nps, move c2c4
Benchmark time 5451 ms, 452 nodes, 83 nps, move d2d4
Benchmark time 5835 ms, 628 nodes, 107 nps, move d2d4
Absolute error histogram for a batch of 32
 0.15 +                                                                                         +
      |                                                                   ##                    |
      |                                                                  ###                    |
      |                                                                  ###                    |
      |                                                                 ####                    |
  0.1 +                                                                 #####                   +
      |                                                                 ######                  |
      |                                                                 ######                  |
      |                                                                 ######                  |
      |                                                                #######                  |
 0.05 +                                                                #########                +
      |                                                                ######### #              |
      |                                                               #############             |
      |                                                             # ###############           |
      |                                                         ######################          |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 
Benchmark time 10005 ms, 883 nodes, 88 nps, move d2d4
bestmove d2d4
Absolute error histogram for a batch of 32
      |                                                                                         |
      |                                                                   #                     |
 0.15 +                                                                   #                     +
      |                                                                   ##                    |
      |                                                                   ##                    |
      |                                                                  ###                    |
      |                                                                  ####                   |
  0.1 +                                                                  ####                   +
      |                                                                  #####                  |
      |                                                                  #####                  |
      |                                                                 ######                  |
      |                                                                 #######                 |
 0.05 +                                                                 #######                 +
      |                                                                 ######## #              |
      |                                                               ##############            |
      |                                                             #################           |
      |                                                 ##     #######################          |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 
Absolute error histogram for a batch of 73
  0.2 +                                                                                         +
      |                                                                    #                    |
      |                                                                    #                    |
      |                                                                    #                    |
      |                                                                    #                    |
 0.15 +                                                                    #                    +
      |                                                                   ##                    |
      |                                                                   ###                   |
      |                                                                   ###                   |
      |                                                                   ###                   |
  0.1 +                                                                   ###                   +
      |                                                                  #####                  |
      |                                                                  #####                  |
      |                                                                  ######                 |
      |                                                                  ######                 |
 0.05 +                                                                 #######                 +
      |                                                                 ########                |
      |                                                               ##############            |
      |                                                            ##################           |
      |                                                     ## #######################          |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 

===========================
Total time (ms) : 15160
Nodes searched  : 985
Nodes/second    : 65
ChinChangYang commented 6 months ago

Backendbench

% ./lc0 backendbench -b coreml --max-batch-size=32                           

       _
|   _ | |
|_ |_ |_| v0.31.0-dev+git.dirty built Feb  6 2024
Found pb network file: ./weights_run1_817374.lc0
Creating backend [coreml]...
2024-02-06 21:58:45.937 lc0[73848:20800526] Compiling model: lc0.mlpackage/ -- file:///Users/chinchangyang/Code/lc0-ccy/build/release/
2024-02-06 21:58:46.152 lc0[73848:20800532] Compiled model URL: file:///var/folders/dv/kdr9x4yn4s106_94ydk5jnjc0000gn/T/lc0_6C49057C-F712-4EA4-82DA-025C0114D284.mlmodelc
2024-02-06 21:58:46.152 lc0[73848:20800532] Initializing model with the compiled model URL...
2024-02-06 21:58:52.523 lc0[73848:20800532] Model successfully initialized
Benchmark batch size 1 with inference average time 3.219ms - throughput 310.656 nps.
Benchmark batch size 2 with inference average time 5.9926ms - throughput 333.745 nps.
Benchmark batch size 3 with inference average time 8.73419ms - throughput 343.478 nps.
Benchmark batch size 4 with inference average time 11.4712ms - throughput 348.699 nps.
Benchmark batch size 5 with inference average time 14.4862ms - throughput 345.156 nps.
Benchmark batch size 6 with inference average time 17.2667ms - throughput 347.489 nps.
Benchmark batch size 7 with inference average time 20.1971ms - throughput 346.585 nps.
Benchmark batch size 8 with inference average time 22.8665ms - throughput 349.856 nps.
Benchmark batch size 9 with inference average time 25.7457ms - throughput 349.572 nps.
Benchmark batch size 10 with inference average time 28.3601ms - throughput 352.608 nps.
Benchmark batch size 11 with inference average time 31.1257ms - throughput 353.406 nps.
Benchmark batch size 12 with inference average time 34.1467ms - throughput 351.425 nps.
Benchmark batch size 13 with inference average time 36.6708ms - throughput 354.506 nps.
Benchmark batch size 14 with inference average time 39.3265ms - throughput 355.994 nps.
Benchmark batch size 15 with inference average time 42.7149ms - throughput 351.165 nps.
Benchmark batch size 16 with inference average time 45.0219ms - throughput 355.383 nps.
Benchmark batch size 17 with inference average time 47.9304ms - throughput 354.681 nps.
Benchmark batch size 18 with inference average time 50.6393ms - throughput 355.455 nps.
Benchmark batch size 19 with inference average time 53.6795ms - throughput 353.953 nps.
Benchmark batch size 20 with inference average time 56.2379ms - throughput 355.632 nps.
Benchmark batch size 21 with inference average time 58.8418ms - throughput 356.889 nps.
Benchmark batch size 22 with inference average time 62.2577ms - throughput 353.37 nps.
Benchmark batch size 23 with inference average time 64.3085ms - throughput 357.651 nps.
Benchmark batch size 24 with inference average time 66.891ms - throughput 358.792 nps.
Benchmark batch size 25 with inference average time 69.6432ms - throughput 358.973 nps.
Benchmark batch size 26 with inference average time 72.4245ms - throughput 358.995 nps.
Benchmark batch size 27 with inference average time 75.6138ms - throughput 357.078 nps.
Benchmark batch size 28 with inference average time 78.1265ms - throughput 358.393 nps.
Benchmark batch size 29 with inference average time 80.7182ms - throughput 359.275 nps.
Benchmark batch size 30 with inference average time 83.8129ms - throughput 357.94 nps.
Benchmark batch size 31 with inference average time 86.329ms - throughput 359.091 nps.
Benchmark batch size 32 with inference average time 89.3303ms - throughput 358.221 nps.
ChinChangYang commented 6 months ago

During the compilation process documented in Mac (5438) - LeelaChessZero/lc0, an error was encountered due to the compileModelAtURL:completionHandler: method from the Core ML framework being unavailable. This method is only accessible in macOS 13.0 or later, yet the current environment operates under macOS 12.3.1 with Xcode 13.4, as detailed in the supported Xcode versions. To rectify this issue, it is recommended to update the Xcode version specified in the config.yml file to 14.3.1 or later, which would inherently upgrade the macOS environment to meet the necessary version requirement for the method in question, thus potentially resolving the compilation error.