Error while running 8_bit_quantize_weights method

stepanzalis commented 1 year ago

🐞Describing the bug

With successfully converted model from Tenserflow to CoreML format, I wanted try to quantize model. I run snippet below with model in .mlpackage format but got an error.

Stack Trace

ValueError                                Traceback (most recent call last)
Cell In[691], line 20
     14 import coremltools.optimize as cto
     16 config = cto.coreml.OptimizationConfig(
     17     global_config=cto.coreml.OpLinearQuantizerConfig(mode="linear_symmetric")
     18 )
---> 20 compressed_8_bit_model = cto.coreml.linear_quantize_weights(mlModel, config)
     21 compressed_8_bit_model.save("[./RockPaperScissors_8bit.mlpackage](https://file+.vscode-resource.vscode-cdn.net/Users/stepanzalis/Desktop/Rock-paper-scissors/RockPaperScissors_8bit.mlpackage)")

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/optimize/coreml/_post_training_quantization.py:172](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/optimize/coreml/_post_training_quantization.py:172), in linear_quantize_weights(mlmodel, config)
     62 """
     63 Utility function to convert a float precision MLModel of type ``mlprogram``, which uses
     64 float-precision weights, into a compressed MLModel that uses 8-bit weights. This is
   (...)
    168 
    169 """
    171 linear_weight_quantizer = _linear_quantize_weights(config, fake_compression=False)
--> 172 return _apply_graph_pass(mlmodel, linear_weight_quantizer)
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/optimize/coreml/_post_training_quantization.py:51](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/optimize/coreml/_post_training_quantization.py:51), in _apply_graph_pass(mlmodel, graph_pass)
     48 graph_pass.apply(prog)
     50 # convert the pymil program back to mlmodel
---> 51 compressed_mlmodel = _mil_convert(
     52     prog,
...
    289 @property
    290 def shape(self):
--> 291     raise ValueError("shape not applicable to ListVar '{}'.".format(self.name))
ValueError: shape not applicable to ListVar 'const_13'.

To Reproduce

Repreduced by snippet with model attached below.

import coremltools as ct
import coremltools.optimize as cto

model = ct.models.MLModel("./RockPaperScissors.mlpackage")

config = cto.coreml.OptimizationConfig(
    global_config=cto.coreml.OpLinearQuantizerConfig(mode="linear_symmetric")
)

compressed_8_bit_model = cto.coreml.linear_quantize_weights(mlModel, config)
compressed_8_bit_model.save("./RockPaperScissors_8bit.mlpackage")

System environment (please complete the following information):

coremltools version: 7.0b1
OS: MacOS 13.4.1
Tenserflow 2.13.0

Additional info

Model predictions are the same as the original Tenserflow model.
Model can be downloaded here: https://drive.google.com/file/d/1X-1d9lMlrjSYnqW3fL9arziwWoU0qkac/view?usp=sharing
I noted none of the optimisation methods works. Could it be related to architecture of the model? It's basically MobileNetV2 with one extra Dropout layer and output layer for classification (only 3 outputs).
When converting to .mlmodel, it works with older API.

aseemw commented 1 year ago

I am able to reproduce the error. Thanks for sharing the model to easily reproduce the issue. This is clearly a bug, that happens in the const_deduplication pass while loading the mlpackge model into the internal model class. We will look into it to find the root cause and fix.

Meanwhile, can you try one thing. Re-converting the model from TF but without using the classifier mode, so that the outputs are just multi arrays. And then verify if the ct.optimize.coreml APIs still dont work. This will rule out if it's a classifier related bug.

stepanzalis commented 1 year ago

Thanks for the reply. When converting model from TF without classifier (like example below):

mlModel = ct.convert(
    model, 
    inputs=[image_input], 
    # classifier_config=classifier_config,
    minimum_deployment_target=ct.target.iOS16,
)

It doesn't produce any error while converting to 8 bits and mlpackage model looks fine, so seems like it's related to classifier_config param.

YifanShenSZ commented 1 year ago

Hi @stepanzalis, I successfully resolved the shape not applicable to ListVar error, but I encountered a new error

  File "/Volumes/data/Software/Mine/coremltools-github_issue-1937/coremltools/converters/mil/mil/passes/defs/optimize_repeat_ops.py", line 800, in _add_output_sinks
    out_sink = mb.identity(x=out_var)
  File "/Volumes/data/Software/Mine/coremltools-github_issue-1937/coremltools/converters/mil/mil/ops/registry.py", line 183, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
  File "/Volumes/data/Software/Mine/coremltools-github_issue-1937/coremltools/converters/mil/mil/builder.py", line 166, in _add_op
    new_op = op_cls(**kwargs)
  File "/Volumes/data/Software/Mine/coremltools-github_issue-1937/coremltools/converters/mil/mil/operation.py", line 187, in __init__
    self._validate_and_set_inputs(input_kv)
  File "/Volumes/data/Software/Mine/coremltools-github_issue-1937/coremltools/converters/mil/mil/operation.py", line 495, in _validate_and_set_inputs
    self.input_spec.validate_inputs(self.name, self.op_type, input_kvs)
  File "/Volumes/data/Software/Mine/coremltools-github_issue-1937/coremltools/converters/mil/mil/input_type.py", line 162, in validate_inputs
    raise ValueError(msg.format(name, var.name, input_type.type_str,
ValueError: Op "identity_1" (op_type: identity) Input x="classLabel_probs" expects list, tensor, or scalar but got dict[str,fp64]

The reason of these errors seem to be the use of python List and Dict rather than TensorFlow tensor. Is this the case in your original TensorFlow model?

stepanzalis commented 1 year ago

Sharing definition of the model:

model = tf.keras.models.Sequential()
model.add(base_model)
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(
        units=NUM_CLASSES,
        activation=tf.keras.activations.softmax,
        kernel_regularizer=tf.keras.regularizers.l2(l=0.01)
))

where base_model is Keras default model:

base_model = tf.keras.applications.MobileNetV2(
  input_shape=INPUT_IMG_SHAPE,
  include_top=False,
  weights='imagenet',
  pooling='avg'
)

Then (after training) the model is passed into convert function I already shared. Let me know if this is helpful to you.

YifanShenSZ commented 1 year ago

Hmmm, do you know where the list ['rock', 'paper', 'scissors'] I saw come from?

Also, do you know where the dict classLabel_probs come from?

stepanzalis commented 1 year ago

I guess it's coming from dataset itself.

YifanShenSZ commented 1 year ago

I see. These list and dict are introduced by classifier indeed.

Please continue to workaround by converting without classifier_config. We have several optimization passes that are vulnerable to list and dict, since they are expecting tensor:

const_deduplication: hash does not work with list, fix PR
reduce_transposes: output sink identity does not work with dict, draft fix
adjust_io_to_supported_types: string io is not supported, draft fix

stepanzalis commented 1 year ago

Alright, thanks a lot for solving it! 🙏

apple / coremltools