Closed ZachNagengast closed 1 year ago
Follow up: I managed to get the conversions working on an older version of coremltools (6.3.0)
One notable difference, on the older coremltools version, I got an error on the convert function that required me to add dtypes to the TensorTypes:
ct.TensorType(name="input_ids", shape=(1, 512), dtype=np.int32),
ct.TensorType(name="token_type_ids", shape=(1, 512), dtype=np.int32),
ct.TensorType(name="attention_mask", shape=(1, 512), dtype=np.int32),
This error did not show up on 7.0b1. I've also confirmed it works on my current torch version (2.1.0.dev20230728)
Full stacktrace from 6.3.0:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[4], line 22
19 traced_model = torch.jit.trace(wrapped_model.eval(), (encoded_input['input_ids'], encoded_input['token_type_ids'], encoded_input['attention_mask']))
20 traced_model.eval()
---> 22 mlprogram = ct.convert(
23 traced_model,
24 minimum_deployment_target=ct.target.macOS13,
25 inputs=[
26 ct.TensorType(name="input_ids", shape=(1, 512)),
27 ct.TensorType(name="token_type_ids", shape=(1, 512)),
28 ct.TensorType(name="attention_mask", shape=(1, 512)),
29 ],
30 outputs=[ct.TensorType(name="embeddings")],
31 convert_to="mlprogram",
32 compute_units=ct.ComputeUnit.ALL,
33 compute_precision=ct.precision.FLOAT16,
34 )
36 spec = mlprogram.get_spec()
37 outputmodel = ct.models.MLModel(spec, weights_dir=mlprogram.weights_dir)
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py:492](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py:492), in convert(model, source, inputs, outputs, classifier_config, minimum_deployment_target, convert_to, compute_precision, skip_model_load, compute_units, package_dir, debug, pass_pipeline)
489 if specification_version is None:
490 specification_version = _set_default_specification_version(exact_target)
--> 492 mlmodel = mil_convert(
493 model,
494 convert_from=exact_source,
495 convert_to=exact_target,
496 inputs=inputs,
497 outputs=outputs_as_tensor_or_image_types, # None or list[ct.ImageType/ct.TensorType]
498 classifier_config=classifier_config,
499 skip_model_load=skip_model_load,
500 compute_units=compute_units,
501 package_dir=package_dir,
502 debug=debug,
503 specification_version=specification_version,
504 main_pipeline=pass_pipeline,
505 )
507 if exact_target == 'milinternal':
508 return mlmodel # Returns the MIL program
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:188](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:188), in mil_convert(model, convert_from, convert_to, compute_units, **kwargs)
149 @_profile
150 def mil_convert(
151 model,
(...)
155 **kwargs
156 ):
157 """
158 Convert model from a specified frontend `convert_from` to a specified
159 converter backend `convert_to`.
(...)
186 See `coremltools.converters.convert`
187 """
--> 188 return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:212](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:212), in _mil_convert(model, convert_from, convert_to, registry, modelClass, compute_units, **kwargs)
209 weights_dir = _tempfile.TemporaryDirectory()
210 kwargs["weights_dir"] = weights_dir.name
--> 212 proto, mil_program = mil_convert_to_proto(
213 model,
214 convert_from,
215 convert_to,
216 registry,
217 **kwargs
218 )
220 _reset_conversion_state()
222 if convert_to == 'milinternal':
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:288](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:288), in mil_convert_to_proto(model, convert_from, convert_to, converter_registry, main_pipeline, **kwargs)
285 prog = frontend_converter(model, **kwargs)
286 PipelineManager.apply_pipeline(prog, frontend_pipeline)
--> 288 PipelineManager.apply_pipeline(prog, main_pipeline)
290 prog._check_invalid_tensor_rank()
292 if convert_to == 'milinternal':
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/pass_pipeline.py:378](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/pass_pipeline.py:378), in PipelineManager.apply_pipeline(prog, pass_pipeline)
376 graph_pass = PASS_REGISTRY[pass_name]
377 graph_pass.set_options(pass_options)
--> 378 graph_pass(prog)
379 prog.validate()
380 logger.debug(f"Program after {pass_pipeline} pipeline:\n{prog}")
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/graph_pass.py:55](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/graph_pass.py:55), in AbstractGraphPass.__call__(self, prog)
53 def __call__(self, prog: Program):
54 if not prog.skip_all_passes:
---> 55 self.apply(prog)
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:94](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:94), in AbstractQuantizationPass.apply(self, prog)
91 self.transform_op(op)
93 for f in prog.functions.values():
---> 94 apply_block(f)
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/helper.py:54](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/helper.py:54), in block_context_manager..wrapper(*args)
49 raise ValueError(
50 "The function decorated with block_context_manager must have a Block "
51 "type argument as the first input."
52 )
53 with block:
---> 54 return func(*args)
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:91](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:91), in AbstractQuantizationPass.apply..apply_block(block)
89 need_transform = op.op_type not in getattr(self, "skip_ops_by_type", set())
90 if need_transform:
---> 91 self.transform_op(op)
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:283](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/quantization.py:283), in FP16ComputePrecision.transform_op(self, op)
274 if old_output_var.is_tensor_or_scalar_of(dtype="fp32") and (
275 not new_output_var.is_tensor_or_scalar_of(dtype="fp32")
276 ):
277 x = mb.cast(
278 x=new_output_var,
279 dtype="fp32",
280 name=new_output_var.name + "_to_fp32",
281 before_op=op,
282 )
--> 283 op.enclosing_block.replace_uses_of_var_after_op(
284 anchor_op=op,
285 old_var=old_output_var,
286 new_var=x,
287 force_replace=True,
288 )
289 else:
290 op.enclosing_block.replace_uses_of_var_after_op(
291 anchor_op=op,
292 old_var=old_output_var,
293 new_var=new_output_var,
294 force_replace=True,
295 )
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/block.py:627](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/block.py:627), in Block.replace_uses_of_var_after_op(self, anchor_op, old_var, new_var, no_check_var_visibility, end_op, no_check_var_types, force_replace)
624 msg = "end_op '{}' comes before the anchor_op '{}'"
625 raise ValueError(msg.format(end_op.name, anchor_op.name))
--> 627 num_ops_affected = self._replace_var(
628 old_var,
629 new_var,
630 start=start,
631 end_id=end_id,
632 no_check_var_types=no_check_var_types,
633 )
635 logger.debug("Num ops affected in replacing var: {}".format(num_ops_affected))
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/block.py:405](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/block.py:405), in Block._replace_var(self, old_var, new_var, start, end_id, no_check_var_types)
403 if affected:
404 num_ops_affected += 1
--> 405 op.set_inputs(no_check_var_types=no_check_var_types,
406 **new_inputs)
408 # Replace recursively.
409 for b in op.blocks:
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:225](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:225), in Operation.set_inputs(self, no_check_var_types, type_inference, **input_kvs)
215 def set_inputs(self, no_check_var_types=False, type_inference=False, **input_kvs):
216 """
217 Parameters
218 ----------
(...)
223 True to perform type inference and recreate output Var.
224 """
--> 225 self._validate_and_set_inputs(input_kvs, no_check_var_types=no_check_var_types)
226 if type_inference and not no_check_var_types:
227 self.type_value_inference()
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:507](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:507), in Operation._validate_and_set_inputs(self, input_kvs, no_check_var_types)
505 check_and_detach(v_new, v_old, self, no_check_var_types)
506 else:
--> 507 check_and_detach(
508 var, existing_input_var, self, no_check_var_types
509 )
511 # Set var as input_var
512 if isinstance(var, Var):
513 # TODO: the child op of complex op's input might get lost, as the complex op will
514 # be lowered. Maybe should add child op here and take care of it in lowering pass.
File [/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:493](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:493), in Operation._validate_and_set_inputs..check_and_detach(v_new, v_old, op, no_check_var_types)
488 if (
489 not is_compatible_type(v_new.sym_type, v_old.sym_type)
490 and not no_check_var_types
491 ):
492 msg = "New var type {} not a subtype of " + "existing var type {}"
--> 493 raise ValueError(msg.format(v_new.sym_type, v_old.sym_type))
494 v_old.remove_child_op(op, no_check_var_types)
ValueError: New var type .tensor'> not a subtype of existing var type .tensor'>
- I have submitted a bug report to https://developer.apple.com/bug-reporting/, but based on the previous issue it could be a reoccurrence of whatever happened there, so I wanted to bring it up as an issue here as well for awareness.
Submitting the bug report there is the right thing to do. This is an issue with the Core ML Framework not the Coremltools Python package. So I'll close this GitHub issue.
- Any other relevant version information (e.g. PyTorch or TensorFlow version):
torch==2.1.0.dev20230728
This isn't a version of PyTorch that we support. We do however support the most recent official PyTorch release (2.0.1
).
@TobyRoseman Ok I thought it could be CoreML framework, I just wanted to bring it up here because the conversion worked fine on coremltools 6.3.0 but not 7.0b1. Thanks for following up.
the conversion worked fine on coremltools 6.3.0 but not 7.0b1.
I'm confused. I thought the issue was that the converted model was giving NaN
values in Objective-C but not Python (using coremltools). What conversion worked in coremltools 6.3.0
but not 7.0b1
?
Ah yes I was able to get the model to give valid outputs when I rolled back to 6.3.0
. Both versions created valid models in Python, but only 7.0b1
created a model that output nans in Swift.
The conversion code above worked exactly the same with 6.3.0
except that I had to give my inputs a dtype for the convert function to run, whereas 7.0b1
didn't require them to run the convert function.
Does it work with 7.0b1
if you specify the same dtypes?
Specifying the dtypes didn't help for 7.0b1
, the only thing that worked was rolling back.
Here's an upload of the two models for comparison: https://drive.google.com/file/d/1rOtpl6BTDAGQ7RPi6IqzNWFoS86h_aQR/view?usp=sharing
Couple key differences are at the top and bottom areas of the model:
Top (left is 7.0b1, right is 6.3.0):
Bottom (left is 7.0b1, right is 6.3.0):
This bottom part in particular is going into a real_div after this section, which might be relevant? Detail on the "clip" node (left is 7.0b1, right is 6.3.0):
I'm still a bit confused about what the actual issue is here. It seems to have changed from the initial description of this issue. Just having the MLModels is particularly helpful. Can you give us self-contained (ideally simple) code that worked in 6.3.0 but doesn't work in 7.0b1? Feel free to create a new issue, if you think that would be cleaner.
I wasn’t sure what the real source of the issue was in the original post, but I’ve since narrowed it down to the coremltools version. The conversion code in this issue is still valid and self-contained, it creates a perfectly fine model on 6.3.0, but it creates a nan-outputting model on 7.0b1, when running on an iOS/macOS device. Would you like me to upload an example app somewhere that works on its own to replicate the issue?
Hi @ZachNagengast, with coremltools 7.2, could you please try compute_precision=compute_precision=ct.precision.FLOAT32
?
Concretely, this clip
op is trying to clip huge fp32 value to 3.4e38
, and this 3.4e38
is unrepresentable in fp16, so it becomes inf
if we use fp16 compute precision.
(And probably later on a 0 / inf
happens, resulting in NAN 😮💨)
Looks like fp32 compute precision solves the similar issue https://github.com/apple/coremltools/issues/2223, so hopefully it could work here as well
🐞Describing the bug
msmarco-bert-base-dot-v5
and it converts and works fine in python, but whenever I bring it into Swift and run it on a simulator, the output is all NaN, for every combination of precision and compute unit I can think of.To Reproduce
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/msmarco-bert-base-dot-v5") model = AutoModel.from_pretrained("sentence-transformers/msmarco-bert-base-dot-v5")
import torch.nn.functional as F from pprint import pprint
encoded_input = tokenizer("test sentence", padding="max_length", truncation=True, return_tensors='pt')
class ModelWrapper(torch.nn.Module): def init(self, model): super(ModelWrapper, self).init() self.model = model
Initialize the wrapper with the original model
wrapped_model = ModelWrapper(model)
Trace the model with both input_ids and attention_mask
traced_model = torch.jit.trace(wrapped_model.eval(), (encoded_input['input_ids'], encoded_input['token_type_ids'], encoded_input['attention_mask'])) traced_model.eval()
mlprogram = ct.convert( traced_model, minimum_deployment_target=ct.target.macOS13, inputs=[ ct.TensorType(name="input_ids", shape=(1, 512)), ct.TensorType(name="token_type_ids", shape=(1, 512)), ct.TensorType(name="attention_mask", shape=(1, 512)), ], outputs=[ct.TensorType(name="embeddings")], convert_to="mlprogram", compute_units=ct.ComputeUnit.ALL, compute_precision=ct.precision.FLOAT16, )
spec = mlprogram.get_spec() outputmodel = ct.models.MLModel(spec, weights_dir=mlprogram.weights_dir)
saved_model = '~/Downloads/msmarco_bert.mlpackage' outputmodel.save(saved_model)
Swift inference code (produces all
nan
in theembeddings
output)System environment (please complete the following information):
torch==2.1.0.dev20230728
Additional context