I've been trying to reduce the size of my Object Detection ML model generated by Xcode tool CreateML with Transfer Learning.
I found out that the transfer learning mlmodel output from Create ML is a Mixed(Float32, Float16) weights format.
I could successfully quantize this file to Float16 format which saved 0.3MB already but I am not able to quantize the original model nor the Float16 one down to 8-bits. I get a numpy error:
ValueError: operands could not be broadcast together with shapes (0,1) (128,0)
-> Is there a workaround to that error such as converting the Mixed(Float32, Float16) file up to full precision Float32 prior to 8-bits quantization?
Stack Trace
Quantizing using linear quantization
Optimizing Neural Network before Quantization:
Traceback (most recent call last):
File "/Users/username/Desktop/COMPRESSION/compression.py", line 5, in <module>
quantized_model = quantization_utils.quantize_weights(mlmodel, 8)
File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/quantization_utils.py", line 1642, in quantize_weights
qspec = _quantize_spec_weights(spec, nbits, qmode, **kwargs)
File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/quantization_utils.py", line 1128, in _quantize_spec_weights
_quantize_spec_weights(model_spec, nbits, quantization_mode, **kwargs)
File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/quantization_utils.py", line 1113, in _quantize_spec_weights
_quantize_nn_spec(spec.neuralNetwork, nbits, quantization_mode, **kwargs)
File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/quantization_utils.py", line 723, in _quantize_nn_spec
_optimize_nn(layers)
File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/optimization_utils.py", line 213, in _optimize_nn
_conv_bn_fusion(int(conv_idx), int(output_idx), layers)
File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/optimization_utils.py", line 127, in _conv_bn_fusion
wp = (gamma / _np.sqrt(variance))[:, None] * w
ValueError: operands could not be broadcast together with shapes (0,1) (128,0)
To Reproduce
Run this script on an object detection ML model generated with CreateML using transfer learning.
import coremltools as ct
from coremltools.models.neural_network import quantization_utils
mlmodel = ct.models.MLModel('objectDetection.mlmodel')
quantized_model = quantization_utils.quantize_weights(mlmodel, 8)
quantized_model.save("objectDetection-8bits.mlmodel")
🐞Describing the bug
I've been trying to reduce the size of my Object Detection ML model generated by Xcode tool CreateML with Transfer Learning. I found out that the transfer learning mlmodel output from Create ML is a Mixed(Float32, Float16) weights format.
I could successfully quantize this file to Float16 format which saved 0.3MB already but I am not able to quantize the original model nor the Float16 one down to 8-bits. I get a numpy error:
ValueError: operands could not be broadcast together with shapes (0,1) (128,0)
-> Is there a workaround to that error such as converting the Mixed(Float32, Float16) file up to full precision Float32 prior to 8-bits quantization?
Stack Trace
To Reproduce
Run this script on an object detection ML model generated with CreateML using transfer learning.
System environment: