Cannot convert keras.layers.MultiHeadAttention

praeclarum commented 3 years ago

🐞Describe the bug

There is an issue when converting TF2 Keras models that contain MultiHeadAttention:

layers.MultiHeadAttention(num_heads=num_heads, key_dim=head_dim, name="attention")

The conversion fails with: ValueError: Cannot add const [512*is10, 512]

The is10 variable increments each time I try.

The problem seems to be when calculating the matrix size for one of the Einsums. I can't tell if it's the Q,K,V einsums causing trouble or the scaled dot product einsums.

I tried also using the MultiHeadAttention from TensorFlow addons, but that one failed with unsupported einsums.

The model trains and executes fine so this seems to be a conversion issue. I tried 5.0b2.

Trace

coremltools/converters/mil/frontend/tensorflow/ops.py in Einsum(context, node)
    493     a = context[node.inputs[0]]
    494     b = context[node.inputs[1]]
--> 495     x = build_einsum_mil(a, b, equation, node.name)
    496     context.add(node.name, x)
    497 

coremltools/converters/mil/frontend/_utils.py in build_einsum_mil(a_var, b_var, equation, name)
     66         if parsed_vectors_rev == ([0,1,2],[2,3,4],[0,1,3,4]):
     67              a_var, b_var = _swap(a_var, b_var)
---> 68         x_1 = mb.reshape(x=a_var, shape=[a_var.shape[0] * a_var.shape[1], a_var.shape[2]])
     69         x_2 = mb.reshape(x=b_var, shape=[b_var.shape[0], b_var.shape[1] * b_var.shape[2]])
     70         x = mb.matmul(x=x_1, y=x_2, transpose_x=False, transpose_y=False)

coremltools/converters/mil/mil/ops/registry.py in add_op(cls, **kwargs)
     59             @classmethod
     60             def add_op(cls, **kwargs):
---> 61                 return cls._add_op(op_cls, **kwargs)
     62 
     63             setattr(Builder, op_type, add_op)

coremltools/converters/mil/mil/builder.py in _add_op(cls, op_cls, **kwargs)
    160         # Shallow copy list inputs to ensure op inputs are immutable
    161         kwargs = {k: v if not isinstance(v, (list, tuple)) else v[:] for k, v in kwargs.items() if v is not None}
--> 162         kwargs.update(cls._create_vars(
    163             input_spec=op_cls.input_spec,
    164             op_name=kwargs["name"], before_op=before_op,

coremltools/converters/mil/mil/builder.py in _create_vars(cls, input_spec, op_name, before_op, candidate_kv)
    143             if isinstance(in_type, (ScalarOrTensorInputType,
    144               ListOrScalarOrTensorInputType)):
--> 145                 var = cls._add_const(val, new_var_name, before_op)
    146                 update_dict[k] = var
    147 

~/.virtualenvs/codepredictor/lib/python3.9/site-packages/coremltools/converters/mil/mil/builder.py in _add_const(cls, val, name, before_op)
     73     def _add_const(cls, val, name, before_op):
     74         if not is_python_value(val):
---> 75             raise ValueError("Cannot add const {}".format(val))
     76         if any_symbolic(val):
     77             msg = (

ValueError: Cannot add const [512*is10, 512]

To Reproduce

The model source here reproduces this bug: https://github.com/keras-team/keras-io/blob/master/examples/generative/text_generation_with_miniature_gpt.py

System environment:

coremltools version: 5.0b2
OS: MacOS
macOS version: 11.4
XCode version: 12.5.1
How you install python: virtualenv
python version: 3.9.4
TensorFlow: 2.5

TobyRoseman commented 3 years ago

How are you converting the model? Please share your code which uses coremltools.

praeclarum commented 3 years ago

Hello, thank you for getting back to me. My example is a little long so I wrote this minimal repro that reproduces the same error.

import tensorflow as tf
import tensorflow.keras as keras

x = keras.layers.Input(shape=(3, 5))
y = keras.layers.MultiHeadAttention(num_heads=7, key_dim=11)(x, x),
model = keras.models.Model(inputs=[x], outputs=[y])

import coremltools as ct
mlmodel = ct.convert(model)

The error with this example is:

ValueError: Cannot add const [3*is0, 5]

TobyRoseman commented 3 years ago

Thanks for the smaller example. That's very helpful. I can reproduce this issue using TensorFlow 2.5

However the latest version of TensorFlow that we support is 2.3.1. It looks like keras.layers.MultiHeadAttention isn't in that version of TensorFlow.

I'll keep this issue open so we can fix it once we support a version of TensorFlow that has it.

praeclarum commented 3 years ago

Thanks.

It's unfortunate that the alternatives don't convert either. For example, when I use tensorflow_addons:

import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow.keras as keras

x = keras.layers.Input(shape=(3, 5))
y = tfa.layers.MultiHeadAttention(num_heads=7, head_size=11)([x, x]),
model = keras.models.Model(inputs=[x], outputs=[y])
model.summary()

import coremltools as ct
mlmodel = ct.convert(model)

I get this conversion error:

('Einsum unsupported equation format: ', '...NI,HIO->...NHO')

Any chance we could get support for that equation?

Alternatively, do you know of any multi-head attention libraries that work with CoreML tools?

netanellavisdris commented 2 years ago

Hi @TobyRoseman , Any update for this issue? This error still reproduce.

TobyRoseman commented 2 years ago

Sorry @netanellavisdris - no updates.

This is related to #1537.

fukatani commented 1 year ago

Since coremltools 6.3 supports flexible shape einsum, this issue can be resolved.

TobyRoseman commented 1 year ago

The demo code still fails with 6.3. Although the error is different:

ValueError                                Traceback (most recent call last)
Cell In[1], line 11
      8 model.summary()
     10 import coremltools as ct
---> 11 mlmodel = ct.convert(model)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py:492, in convert(model, source, inputs, outputs, classifier_config, minimum_deployment_target, convert_to, compute_precision, skip_model_load, compute_units, package_dir, debug, pass_pipeline)
    489 if specification_version is None:
    490     specification_version = _set_default_specification_version(exact_target)
--> 492 mlmodel = mil_convert(
    493     model,
    494     convert_from=exact_source,
    495     convert_to=exact_target,
    496     inputs=inputs,
    497     outputs=outputs_as_tensor_or_image_types,  # None or list[ct.ImageType/ct.TensorType]
    498     classifier_config=classifier_config,
    499     skip_model_load=skip_model_load,
    500     compute_units=compute_units,
    501     package_dir=package_dir,
    502     debug=debug,
    503     specification_version=specification_version,
    504     main_pipeline=pass_pipeline,
    505 )
    507 if exact_target == 'milinternal':
    508     return mlmodel  # Returns the MIL program

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:188, in mil_convert(model, convert_from, convert_to, compute_units, **kwargs)
    149 @_profile
    150 def mil_convert(
    151     model,
   (...)
    155     **kwargs
    156 ):
    157     """
    158     Convert model from a specified frontend `convert_from` to a specified
    159     converter backend `convert_to`.
   (...)
    186         See `coremltools.converters.convert`
    187     """
--> 188     return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:212, in _mil_convert(model, convert_from, convert_to, registry, modelClass, compute_units, **kwargs)
    209     weights_dir = _tempfile.TemporaryDirectory()
    210     kwargs["weights_dir"] = weights_dir.name
--> 212 proto, mil_program = mil_convert_to_proto(
    213                         model,
    214                         convert_from,
    215                         convert_to,
    216                         registry,
    217                         **kwargs
    218                      )
    220 _reset_conversion_state()
    222 if convert_to == 'milinternal':

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:285, in mil_convert_to_proto(model, convert_from, convert_to, converter_registry, main_pipeline, **kwargs)
    280 frontend_pipeline, backend_pipeline = _construct_other_pipelines(
    281     main_pipeline, convert_from, convert_to
    282 )
    284 frontend_converter = frontend_converter_type()
--> 285 prog = frontend_converter(model, **kwargs)
    286 PipelineManager.apply_pipeline(prog, frontend_pipeline)
    288 PipelineManager.apply_pipeline(prog, main_pipeline)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:98, in TensorFlow2Frontend.__call__(self, *args, **kwargs)
     95 from .frontend.tensorflow2.load import TF2Loader
     97 tf2_loader = TF2Loader(*args, **kwargs)
---> 98 return tf2_loader.load()

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/tensorflow/load.py:82, in TFLoader.load(self)
     75     dot_string = self._tf_ssa.get_dot_string(
     76         annotation=True, name_and_op_style=True, highlight_debug_nodes=[]
     77     )
     78     graphviz.Source(dot_string).view(
     79         filename="/tmp/ssa_before_tf_passes", cleanup=True
     80     )
---> 82 program = self._program_from_tf_ssa()
     83 logger.debug("program:\n{}".format(program))
     84 return program

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/tensorflow2/load.py:210, in TF2Loader._program_from_tf_ssa(self)
    203 self._run_tf_ssa_passes()
    204 converter = TF2Converter(
    205     tfssa=self._tf_ssa,
    206     inputs=self.kwargs["inputs"],
    207     outputs=self.kwargs["outputs"],
    208     opset_version=self.kwargs["specification_version"],
    209 )
--> 210 return converter.convert()

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/tensorflow/converter.py:465, in TFConverter.convert(self)
    463 for g_name in self.graph_stack[1:]:
    464     self.context.add_graph(g_name, self.tfssa.functions[g_name].graph)
--> 465 self.convert_main_graph(prog, graph)
    466 return prog

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/tensorflow/converter.py:389, in TFConverter.convert_main_graph(self, prog, graph)
    387         input_var = mb.cast(x=input_var, dtype="fp32", name=name)
    388     self.context.add(name, input_var)
--> 389 outputs = convert_graph(self.context, graph, self.output_names)
    390 ssa_func.set_outputs(outputs)
    391 prog.add_function("main", ssa_func)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/tensorflow/convert_utils.py:191, in convert_graph(context, graph, outputs)
    187     msg = "Conversion for TF op '{0}' not implemented.\n \n{1}".format(
    188         node.op, node.original_node
    189     )
    190     raise NotImplementedError(msg)
--> 191 add_op(context, node)
    193 if len(node.outputs) > 0:
    194     # set_global / get_global / NoOp has no direct consumer / outputs
    195     x = context[node.name]

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/tensorflow/ops.py:555, in Einsum(context, node)
    553 a = context[node.inputs[0]]
    554 b = context[node.inputs[1]]
--> 555 x = build_einsum_mil(a, b, equation, node.name)
    556 context.add(node.name, x)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/_utils.py:171, in build_einsum_mil(a_var, b_var, equation, name)
    169         x = mb.einsum(values=(b_var, a_var), equation=equation_rev, name=name)
    170 else:
--> 171     x = solve_generic_einsum(parsed_vectors, a_var, b_var, name)
    173 return x

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/_utils.py:331, in solve_generic_einsum(parsed_vectors, a_var, b_var, name)
    328             return 1
    329     return mb.concat(values=dims, axis=0)
--> 331 parsed_vectors, vars = solve_diagonal_einsum(parsed_vectors, [a_var, b_var])
    332 parsed_vectors, vars = solve_sum_einsum(parsed_vectors, vars)
    333 a_var, b_var = vars

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/_utils.py:254, in solve_diagonal_einsum(parsed_vectors, vars)
    252 for i in range(len(vars)):
    253     while len(parsed_vectors[i]) != len(set(parsed_vectors[i])):
--> 254         parsed_vector, var = solve_diagonal_einsum_one_step(parsed_vectors[i], vars[i])
    255         parsed_vectors[i] = parsed_vector
    256         vars[i] = var

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/_utils.py:246, in solve_diagonal_einsum.<locals>.solve_diagonal_einsum_one_step(parsed_vector, x)
    244 indices = mb.range_1d(end=dim_length, start=0, step=1)
    245 indices = mb.stack(values=[indices] * len(duplicated_indices), axis=1)
--> 246 x = mb.transpose(x=x, perm=perm)
    247 x = mb.gather_nd(x=x, indices=indices)
    248 ret_parsed_vector = [parsed_vector[0]] + parsed_vector[len(duplicated_indices):]

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/registry.py:182, in SSAOpRegistry.register_op.<locals>.class_wrapper.<locals>.add_op(cls, **kwargs)
    179 else:
    180     op_cls_to_add = op_reg[op_type]
--> 182 return cls._add_op(op_cls_to_add, **kwargs)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/mil/builder.py:182, in Builder._add_op(cls, op_cls, **kwargs)
    180 curr_block()._insert_op_before(new_op, before_op=before_op)
    181 new_op.build_nested_blocks()
--> 182 new_op.type_value_inference()
    183 if len(new_op.outputs) == 1:
    184     return new_op.outputs[0]

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:253, in Operation.type_value_inference(self, overwrite_output)
    243 def type_value_inference(self, overwrite_output=False):
    244     """
    245     Perform type inference and auto_val computation based on new input Vars
    246     in kwargs. If self._output_vars is None then we generate _output_vars;
   (...)
    251     existing _output_vars
    252     """
--> 253     output_types = self.type_inference()
    254     if not isinstance(output_types, tuple):
    255         output_types = (output_types,)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/tensor_transformation.py:956, in transpose.type_inference(self)
    954 if len(perm) != self.x.rank:
    955     msg = "perm should have the same length as rank(x): {} != {}"
--> 956     raise ValueError(msg.format(len(perm), self.x.rank))
    957 if self.x.rank == 0:
    958     return self.x.sym_type  # scalar cannot be transposed

ValueError: perm should have the same length as rank(x): 5 != 3

apple / coremltools