ZachNagengast commented 1 year ago

🐞Describing the bug

One transformer model conversion in particular is giving me some trouble and it may be related to this past issue https://github.com/apple/coremltools/issues/225. The model is msmarco-bert-base-dot-v5 and it converts and works fine in python, but whenever I bring it into Swift and run it on a simulator, the output is all NaN, for every combination of precision and compute unit I can think of.
I have submitted a bug report to https://developer.apple.com/bug-reporting/, but based on the previous issue it could be a reoccurrence of whatever happened there, so I wanted to bring it up as an issue here as well for awareness.

To Reproduce

Conversion code:


import torch
from transformers import AutoModel, AutoTokenizer
import coremltools as ct
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/msmarco-bert-base-dot-v5") model = AutoModel.from_pretrained("sentence-transformers/msmarco-bert-base-dot-v5")

import torch.nn.functional as F from pprint import pprint

encoded_input = tokenizer("test sentence", padding="max_length", truncation=True, return_tensors='pt')

class ModelWrapper(torch.nn.Module): def init(self, model): super(ModelWrapper, self).init() self.model = model

def forward(self, input_ids, token_type_ids, attention_mask):
    with torch.no_grad():
        model_output = self.model(input_ids=input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask, return_dict=True)

    # Perform pooling
    token_embeddings = model_output.last_hidden_state
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    embeddings = torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    return embeddings

Initialize the wrapper with the original model

wrapped_model = ModelWrapper(model)

Trace the model with both input_ids and attention_mask

traced_model = torch.jit.trace(wrapped_model.eval(), (encoded_input['input_ids'], encoded_input['token_type_ids'], encoded_input['attention_mask'])) traced_model.eval()

mlprogram = ct.convert( traced_model, minimum_deployment_target=ct.target.macOS13, inputs=[ ct.TensorType(name="input_ids", shape=(1, 512)), ct.TensorType(name="token_type_ids", shape=(1, 512)), ct.TensorType(name="attention_mask", shape=(1, 512)), ], outputs=[ct.TensorType(name="embeddings")], convert_to="mlprogram", compute_units=ct.ComputeUnit.ALL, compute_precision=ct.precision.FLOAT16, )

spec = mlprogram.get_spec() outputmodel = ct.models.MLModel(spec, weights_dir=mlprogram.weights_dir)

saved_model = '~/Downloads/msmarco_bert.mlpackage' outputmodel.save(saved_model)


- Python inference code (should produce valid embeddings):
```python
test_sentence = "test sentence" 

encoded_input = []
encoded_input.append(tokenizer(test_sentence, padding="max_length", truncation=True, return_tensors='pt'))

for input_item in encoded_input:
    input_data = input_item['input_ids'].to(torch.float32).numpy()
    type_data = input_item['token_type_ids'].to(torch.float32).numpy()
    mask_data = input_item['attention_mask'].to(torch.float32).numpy()
    print(input_data.shape)
    print(input_data.dtype)
    print(type_data.shape)
    print(type_data.dtype)
    print(mask_data.shape)
    print(mask_data.dtype)
    result = outputmodel.predict({'input_ids': input_data, 'token_type_ids': type_data, 'attention_mask': mask_data})

Swift inference code (produces all nan in the embeddings output)

public func generateMyEmbeddings(inputIds: MLMultiArray, tokenTypeIds: MLMultiArray, attentionMask: MLMultiArray) -> [Float]? {
    let inputFeatures = msmarco_bertInput(
        input_ids: inputIds, // (1,512) int
        token_type_ids: tokenTypeIds, // (1,512) int
        attention_mask: attentionMask // (1,512) int
    )

    let modelConfig = MLModelConfiguration()
    modelConfig.computeUnits = .all
    let model = try! msmarco_bert(configuration: modelConfig)

    let output = try? model.prediction(input: inputFeatures)

    guard let embeddings = output?.embeddings else {
        return nil
    }

    let embeddingsArray: [Float] = (0..<embeddings.count).map { Float(embeddings[$0].floatValue) }
    return embeddingsArray
}

System environment (please complete the following information):

coremltools version: 7.0b1
OS (e.g. MacOS version or Linux type): 14.0 Sonoma
Any other relevant version information (e.g. PyTorch or TensorFlow version): torch==2.1.0.dev20230728

Additional context

There's a thread here with all that we've tried so far, and a link to a test app you can download and try the model out https://github.com/ZachNagengast/similarity-search-kit/issues/17#issuecomment-1657658841

ZachNagengast commented 1 year ago

Follow up: I managed to get the conversions working on an older version of coremltools (6.3.0)

One notable difference, on the older coremltools version, I got an error on the convert function that required me to add dtypes to the TensorTypes:

        ct.TensorType(name="input_ids", shape=(1, 512), dtype=np.int32),
        ct.TensorType(name="token_type_ids", shape=(1, 512), dtype=np.int32),
        ct.TensorType(name="attention_mask", shape=(1, 512), dtype=np.int32),

This error did not show up on 7.0b1. I've also confirmed it works on my current torch version (2.1.0.dev20230728)

Full stacktrace from 6.3.0:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 22
     19 traced_model = torch.jit.trace(wrapped_model.eval(), (encoded_input['input_ids'], encoded_input['token_type_ids'], encoded_input['attention_mask']))
     20 traced_model.eval()
---> 22 mlprogram = ct.convert(
     23     traced_model,
     24     minimum_deployment_target=ct.target.macOS13,
     25     inputs=[
     26         ct.TensorType(name="input_ids", shape=(1, 512)),
     27         ct.TensorType(name="token_type_ids", shape=(1, 512)),
     28         ct.TensorType(name="attention_mask", shape=(1, 512)),
     29     ],
     30     outputs=[ct.TensorType(name="embeddings")],
     31     convert_to="mlprogram",
     32     compute_units=ct.ComputeUnit.ALL,
     33     compute_precision=ct.precision.FLOAT16,
     34 )
     36 spec = mlprogram.get_spec()
     37 outputmodel = ct.models.MLModel(spec, weights_dir=mlprogram.weights_dir)

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py:492](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py:492), in convert(model, source, inputs, outputs, classifier_config, minimum_deployment_target, convert_to, compute_precision, skip_model_load, compute_units, package_dir, debug, pass_pipeline)
    489 if specification_version is None:
    490     specification_version = _set_default_specification_version(exact_target)
--> 492 mlmodel = mil_convert(
    493     model,
    494     convert_from=exact_source,
    495     convert_to=exact_target,
    496     inputs=inputs,
    497     outputs=outputs_as_tensor_or_image_types,  # None or list[ct.ImageType/ct.TensorType]
    498     classifier_config=classifier_config,
    499     skip_model_load=skip_model_load,
    500     compute_units=compute_units,
    501     package_dir=package_dir,
    502     debug=debug,
    503     specification_version=specification_version,
    504     main_pipeline=pass_pipeline,
    505 )
    507 if exact_target == 'milinternal':
    508     return mlmodel  # Returns the MIL program

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:188](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:188), in mil_convert(model, convert_from, convert_to, compute_units, **kwargs)
    149 @_profile
    150 def mil_convert(
    151     model,
   (...)
    155     **kwargs
    156 ):
    157     """
    158     Convert model from a specified frontend `convert_from` to a specified
    159     converter backend `convert_to`.
   (...)
    186         See `coremltools.converters.convert`
    187     """
--> 188     return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:212](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:212), in _mil_convert(model, convert_from, convert_to, registry, modelClass, compute_units, **kwargs)
    209     weights_dir = _tempfile.TemporaryDirectory()
    210     kwargs["weights_dir"] = weights_dir.name
--> 212 proto, mil_program = mil_convert_to_proto(
    213                         model,
    214                         convert_from,
    215                         convert_to,
    216                         registry,
    217                         **kwargs
    218                      )
    220 _reset_conversion_state()
    222 if convert_to == 'milinternal':

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:288](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:288), in mil_convert_to_proto(model, convert_from, convert_to, converter_registry, main_pipeline, **kwargs)
    285 prog = frontend_converter(model, **kwargs)
    286 PipelineManager.apply_pipeline(prog, frontend_pipeline)
--> 288 PipelineManager.apply_pipeline(prog, main_pipeline)
    290 prog._check_invalid_tensor_rank()
    292 if convert_to == 'milinternal':

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/pass_pipeline.py:378](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/pass_pipeline.py:378), in PipelineManager.apply_pipeline(prog, pass_pipeline)
    376     graph_pass = PASS_REGISTRY[pass_name]
    377     graph_pass.set_options(pass_options)
--> 378     graph_pass(prog)
    379     prog.validate()
    380 logger.debug(f"Program after {pass_pipeline} pipeline:\n{prog}")

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/graph_pass.py:55](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/graph_pass.py:55), in AbstractGraphPass.__call__(self, prog)
     53 def __call__(self, prog: Program):
     54     if not prog.skip_all_passes:
---> 55         self.apply(prog)

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:94](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:94), in AbstractQuantizationPass.apply(self, prog)
     91                 self.transform_op(op)
     93 for f in prog.functions.values():
---> 94     apply_block(f)

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/helper.py:54](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/helper.py:54), in block_context_manager..wrapper(*args)
     49     raise ValueError(
     50         "The function decorated with block_context_manager must have a Block "
     51         "type argument as the first input."
     52     )
     53 with block:
---> 54     return func(*args)

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:91](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:91), in AbstractQuantizationPass.apply..apply_block(block)
     89     need_transform = op.op_type not in getattr(self, "skip_ops_by_type", set())
     90 if need_transform:
---> 91     self.transform_op(op)

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:283](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:283), in FP16ComputePrecision.transform_op(self, op)
    274 if old_output_var.is_tensor_or_scalar_of(dtype="fp32") and (
    275     not new_output_var.is_tensor_or_scalar_of(dtype="fp32")
    276 ):
    277     x = mb.cast(
    278         x=new_output_var,
    279         dtype="fp32",
    280         name=new_output_var.name + "_to_fp32",
    281         before_op=op,
    282     )
--> 283     op.enclosing_block.replace_uses_of_var_after_op(
    284         anchor_op=op,
    285         old_var=old_output_var,
    286         new_var=x,
    287         force_replace=True,
    288     )
    289 else:
    290     op.enclosing_block.replace_uses_of_var_after_op(
    291         anchor_op=op,
    292         old_var=old_output_var,
    293         new_var=new_output_var,
    294         force_replace=True,
    295     )

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/block.py:627](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/block.py:627), in Block.replace_uses_of_var_after_op(self, anchor_op, old_var, new_var, no_check_var_visibility, end_op, no_check_var_types, force_replace)
    624     msg = "end_op '{}' comes before the anchor_op '{}'"
    625     raise ValueError(msg.format(end_op.name, anchor_op.name))
--> 627 num_ops_affected = self._replace_var(
    628     old_var,
    629     new_var,
    630     start=start,
    631     end_id=end_id,
    632     no_check_var_types=no_check_var_types,
    633 )
    635 logger.debug("Num ops affected in replacing var: {}".format(num_ops_affected))

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/block.py:405](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/block.py:405), in Block._replace_var(self, old_var, new_var, start, end_id, no_check_var_types)
    403 if affected:
    404     num_ops_affected += 1
--> 405     op.set_inputs(no_check_var_types=no_check_var_types,
    406         **new_inputs)
    408 # Replace recursively.
    409 for b in op.blocks:

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:225](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:225), in Operation.set_inputs(self, no_check_var_types, type_inference, **input_kvs)
    215 def set_inputs(self, no_check_var_types=False, type_inference=False, **input_kvs):
    216     """
    217     Parameters
    218     ----------
   (...)
    223       True to perform type inference and recreate output Var.
    224     """
--> 225     self._validate_and_set_inputs(input_kvs, no_check_var_types=no_check_var_types)
    226     if type_inference and not no_check_var_types:
    227         self.type_value_inference()

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:507](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:507), in Operation._validate_and_set_inputs(self, input_kvs, no_check_var_types)
    505             check_and_detach(v_new, v_old, self, no_check_var_types)
    506     else:
--> 507         check_and_detach(
    508             var, existing_input_var, self, no_check_var_types
    509         )
    511 # Set var as input_var
    512 if isinstance(var, Var):
    513     # TODO: the child op of complex op's input might get lost, as the complex op will
    514     # be lowered. Maybe should add child op here and take care of it in lowering pass.

File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:493](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:493), in Operation._validate_and_set_inputs..check_and_detach(v_new, v_old, op, no_check_var_types)
    488 if (
    489     not is_compatible_type(v_new.sym_type, v_old.sym_type)
    490     and not no_check_var_types
    491 ):
    492     msg = "New var type {} not a subtype of " + "existing var type {}"
--> 493     raise ValueError(msg.format(v_new.sym_type, v_old.sym_type))
    494 v_old.remove_child_op(op, no_check_var_types)

ValueError: New var type .tensor'> not a subtype of existing var type .tensor'>

TobyRoseman commented 1 year ago

I have submitted a bug report to https://developer.apple.com/bug-reporting/, but based on the previous issue it could be a reoccurrence of whatever happened there, so I wanted to bring it up as an issue here as well for awareness.

Submitting the bug report there is the right thing to do. This is an issue with the Core ML Framework not the Coremltools Python package. So I'll close this GitHub issue.

Any other relevant version information (e.g. PyTorch or TensorFlow version): torch==2.1.0.dev20230728

This isn't a version of PyTorch that we support. We do however support the most recent official PyTorch release (2.0.1).

ZachNagengast commented 1 year ago

@TobyRoseman Ok I thought it could be CoreML framework, I just wanted to bring it up here because the conversion worked fine on coremltools 6.3.0 but not 7.0b1. Thanks for following up.

TobyRoseman commented 1 year ago

the conversion worked fine on coremltools 6.3.0 but not 7.0b1.

I'm confused. I thought the issue was that the converted model was giving NaN values in Objective-C but not Python (using coremltools). What conversion worked in coremltools 6.3.0 but not 7.0b1?

ZachNagengast commented 1 year ago

Ah yes I was able to get the model to give valid outputs when I rolled back to 6.3.0. Both versions created valid models in Python, but only 7.0b1 created a model that output nans in Swift.

The conversion code above worked exactly the same with 6.3.0 except that I had to give my inputs a dtype for the convert function to run, whereas 7.0b1 didn't require them to run the convert function.

TobyRoseman commented 1 year ago

Does it work with 7.0b1 if you specify the same dtypes?

ZachNagengast commented 1 year ago

Specifying the dtypes didn't help for 7.0b1, the only thing that worked was rolling back.

Here's an upload of the two models for comparison: https://drive.google.com/file/d/1rOtpl6BTDAGQ7RPi6IqzNWFoS86h_aQR/view?usp=sharing

Couple key differences are at the top and bottom areas of the model:

Top (left is 7.0b1, right is 6.3.0):

Bottom (left is 7.0b1, right is 6.3.0):

This bottom part in particular is going into a real_div after this section, which might be relevant? Detail on the "clip" node (left is 7.0b1, right is 6.3.0):

TobyRoseman commented 1 year ago

I'm still a bit confused about what the actual issue is here. It seems to have changed from the initial description of this issue. Just having the MLModels is particularly helpful. Can you give us self-contained (ideally simple) code that worked in 6.3.0 but doesn't work in 7.0b1? Feel free to create a new issue, if you think that would be cleaner.

ZachNagengast commented 1 year ago

I wasn’t sure what the real source of the issue was in the original post, but I’ve since narrowed it down to the coremltools version. The conversion code in this issue is still valid and self-contained, it creates a perfectly fine model on 6.3.0, but it creates a nan-outputting model on 7.0b1, when running on an iOS/macOS device. Would you like me to upload an example app somewhere that works on its own to replicate the issue?

YifanShenSZ commented 5 months ago

Hi @ZachNagengast, with coremltools 7.2, could you please try compute_precision=compute_precision=ct.precision.FLOAT32?

Concretely, this clip op is trying to clip huge fp32 value to 3.4e38, and this 3.4e38 is unrepresentable in fp16, so it becomes inf if we use fp16 compute precision.

(And probably later on a 0 / inf happens, resulting in NAN 😮‍💨)

YifanShenSZ commented 5 months ago

Looks like fp32 compute precision solves the similar issue https://github.com/apple/coremltools/issues/2223, so hopefully it could work here as well

apple / coremltools

NaN output on Swift CoreML inference, but not in python for specific transformer model #1932

🐞Describing the bug

To Reproduce

Initialize the wrapper with the original model

Trace the model with both input_ids and attention_mask

System environment (please complete the following information):

Additional context