Implement Net-to-CoreML Conversion Script

ChinChangYang commented 8 months ago

This commit introduces the net_to_coreml.py script in the tf/ directory. The script facilitates the conversion of a neural network file into a TensorFlow model, followed by its transformation into a CoreML model. This process mirrors the TensorFlow model conversion methodology used in net_to_model.py.

Key features of the CoreML conversion include:

Setting the input shape to (1, 112, 8, 8).
Defining input_planes as the input name.
Specifying output names as output_policy, output_value, and output_moves_left.
Assigning a concise description to the model, formatted as Lc0 converted from {net name}. The script concludes by saving the CoreML model as {net name}.mlpackage. This enhancement enables the conversion of neural networks into CoreML models, which can be executed using Apple's Neural Engine. Future development of the CoreML backend is planned within the lc0 repository.

Test 1: 128x10 (PASS)

% python net_to_coreml.py --cfg 128x10.yaml-20210723-1032 weights_run2_744706.lc0          
TensorFlow version 2.15.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.12.0 is the most recent version that has been tested.
dataset:
  allow_less_chunks: true
  input_test: dev2/test/
  input_train: dev2/train/
  input_validation: dev2/validate/
  num_chunks: 1000000
  train_ratio: 0.9
gpu: 0
model:
  filters: 128
  residual_blocks: 10
  se_ratio: 4
name: 128x10-t74
training:
  batch_size: 1024
  lr_boundaries:
  - 120
  lr_values:
  - 4.0e-05
  - 4.0e-05
  mask_legal_moves: true
  max_grad_norm: 5.4
  moves_left_loss_weight: 1.0
  num_batch_splits: 1
  num_test_positions: 40000
  path: dev2/networks
  policy_loss_weight: 1.0
  q_ratio: 0
  renorm: true
  renorm_max_d: 0.0
  renorm_max_r: 1.0
  shuffle_size: 500000
  swa: true
  swa_max_n: 10
  swa_output: true
  swa_steps: 100
  test_steps: 500
  total_steps: 2000
  train_avg_report_steps: 200
  validation_steps: 500
  value_focus_min: 1.0
  value_focus_slope: 0.0
  value_loss_weight: 2.0
  warmup_steps: 1000

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Wrote model to dev2/networks/128x10-t74/128x10-t74-0
Running TensorFlow Graph Passes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 20.80 passes/s]
Converting TF Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████████████████████████████████████| 413/413 [00:00<00:00, 11943.65 ops/s]
Running MIL frontend_tensorflow2 pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 1628.68 passes/s]
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [00:00<00:00, 86.30 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 1404.81 passes/s]
Input names: ['input_planes']
Output names: ['output_policy', 'output_value', 'output_moves_left']
Rebuilding model with updated spec ...
Saving model ...
CoreML model saved at dev2/networks/128x10-t74/weights_run2_744706.lc0.mlpackage

Test 2: 512x19 (FAILED)

% python net_to_coreml.py --cfg 512x19-t80.yaml-20230507-0216 512x19-t81-swa-10061000.pb.gz
TensorFlow version 2.15.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.12.0 is the most recent version that has been tested.
dataset:
  allow_less_chunks: true
  input_test:
  - dev1/test/
  input_train:
  - dev1/train/
  input_validation: dev1/validate/
  num_chunks: 3000000
  test_workers: 8
  train_ratio: 0.9
  train_workers: 32
gpu: 0
model:
  default_activation: mish
  filters: 512
  pol_encoder_layers: 0
  policy: attention
  residual_blocks: 19
  se_ratio: 16
name: 512x19-t80
training:
  batch_size: 1024
  checkpoint_steps: 4000
  diff_focus_min: 0.025
  diff_focus_slope: 3.0
  lookahead_optimizer: true
  lr_boundaries:
  - 100
  lr_values:
  - 0.0004
  - 0.0004
  mask_legal_moves: true
  max_grad_norm: 4.0
  moves_left_loss_weight: 1.0
  num_batch_splits: 2
  num_test_positions: 40000
  path: dev1/networks
  policy_loss_weight: 1.0
  q_ratio: 0.0
  reg_term_weight: 0.05
  renorm: true
  renorm_max_d: 0.0
  renorm_max_r: 1.0
  shuffle_size: 500000
  swa: true
  swa_max_n: 10
  swa_output: true
  swa_steps: 100
  test_steps: 500
  total_steps: 500
  train_avg_report_steps: 200
  validation_steps: 500
  value_loss_weight: 1.0
  warmup_steps: 1000

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
Wrote model to dev1/networks/512x19-t80/512x19-t80-0
Running TensorFlow Graph Passes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  7.15 passes/s]
Converting TF Frontend ==> MIL Ops:  97%|███████████████████████████████████████████████████████████████████████████████████▍  | 960/989 [00:00<00:00, 11277.54 ops/s]
Traceback (most recent call last):
  File "/Users/chinchangyang/Code/lczero-training-ccy/tf/net_to_coreml.py", line 50, in <module>
    coreml_model = ct.convert(
                   ^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 574, in convert
    mlmodel = mil_convert(
              ^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 286, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 98, in __call__
    return tf2_loader.load()
           ^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/load.py", line 82, in load
    program = self._program_from_tf_ssa()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow2/load.py", line 210, in _program_from_tf_ssa
    return converter.convert()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/converter.py", line 522, in convert
    self.convert_main_graph(prog, graph)
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/converter.py", line 421, in convert_main_graph
    outputs = convert_graph(self.context, graph, self.output_names)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/convert_utils.py", line 191, in convert_graph
    add_op(context, node)
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/ops.py", line 1332, in RealDiv
    y = mb.cast(x=context[node.inputs[1]], dtype="fp32")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 182, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/builder.py", line 184, in _add_op
    new_op.type_value_inference()
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/operation.py", line 260, in type_value_inference
    output_vals = self._auto_val(output_types)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/operation.py", line 377, in _auto_val
    vals = self.value_inference()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/operation.py", line 111, in wrapper
    return func(self)
           ^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py", line 868, in value_inference
    return self.get_cast_value(self.x, self.dtype.val)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py", line 894, in get_cast_value
    return input_var.val.astype(dtype=string_to_nptype(dtype_val))
           ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'float' object has no attribute 'astype'

The error message is similar with this issue. https://github.com/apple/coremltools/issues/1768

ChinChangYang commented 8 months ago

Regarding with the AttributeError, it can be fixed by the following diff for the coremltools source code:

% git diff --cached coremltools
diff --git a/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py b/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py
index c5ebc40..fb6902f 100644
--- a/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py
+++ b/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py
@@ -890,7 +890,7 @@ class cast(Operation):
                 return np.array(result)
             return None

-        if not types.is_tensor(input_var.sym_type):
-            return input_var.val.astype(dtype=string_to_nptype(dtype_val))
-        else:
+        if isinstance(input_var.val, float) or types.is_tensor(input_var.sym_type):
             return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
+        else:
+            return input_var.val.astype(dtype=string_to_nptype(dtype_val))

I am running coremltools test suites. I will create a pull request in coremltools GitHub repository. If the pull request is accepted, hopefully a new coremltools release includes this fix.

ChinChangYang commented 7 months ago

Test 2: 512x19 (PASSED)

% python net_to_coreml.py --cfg 512x19-t80.yaml-20230507-0216 512x19-t81-swa-10061000.pb.gz
TensorFlow version 2.15.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.12.0 is the most recent version that has been tested.
dataset:
  allow_less_chunks: true
  input_test:
  - dev1/test/
  input_train:
  - dev1/train/
  input_validation: dev1/validate/
  num_chunks: 3000000
  test_workers: 8
  train_ratio: 0.9
  train_workers: 32
gpu: 0
model:
  default_activation: mish
  filters: 512
  pol_encoder_layers: 0
  policy: attention
  residual_blocks: 19
  se_ratio: 16
name: 512x19-t80
training:
  batch_size: 1024
  checkpoint_steps: 4000
  diff_focus_min: 0.025
  diff_focus_slope: 3.0
  lookahead_optimizer: true
  lr_boundaries:
  - 100
  lr_values:
  - 0.0004
  - 0.0004
  mask_legal_moves: true
  max_grad_norm: 4.0
  moves_left_loss_weight: 1.0
  num_batch_splits: 2
  num_test_positions: 40000
  path: dev1/networks
  policy_loss_weight: 1.0
  q_ratio: 0.0
  reg_term_weight: 0.05
  renorm: true
  renorm_max_d: 0.0
  renorm_max_r: 1.0
  shuffle_size: 500000
  swa: true
  swa_max_n: 10
  swa_output: true
  swa_steps: 100
  test_steps: 500
  total_steps: 500
  train_avg_report_steps: 200
  validation_steps: 500
  value_loss_weight: 1.0
  warmup_steps: 1000

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
Wrote model to dev1/networks/512x19-t80/512x19-t80-0
Running TensorFlow Graph Passes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  9.77 passes/s]
Converting TF Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 987/987 [00:00<00:00, 13041.75 ops/s]
Running MIL frontend_tensorflow2 pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 788.02 passes/s]
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [00:03<00:00, 20.14 passes/s]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 805.74 passes/s]
Input names: ['input_planes']
Output names: ['output_policy', 'output_value', 'output_moves_left']
Rebuilding model with updated spec ...
Saving model ...
CoreML model saved at dev1/networks/512x19-t80/512x19-t81-swa-10061000.pb.gz.mlpackage

The AttributeError has been resolved in https://github.com/apple/coremltools/pull/2087.

ChinChangYang commented 6 months ago

Unable to convert the 11248.pb.gz net into a model by net_to_model.py. The issue has been described in https://github.com/LeelaChessZero/lczero-training/issues/224.

LeelaChessZero / lczero-training

Implement Net-to-CoreML Conversion Script #222