apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.38k stars 631 forks source link

Crash converting the text encoder component of CLIP-H #1960

Open damian0815 opened 1 year ago

damian0815 commented 1 year ago

🐞Describing the bug

When converting the text encoder component of LAION's CLIP-H model to CoreML using a variable input shape, ct.convert crashes Python.

Stack Trace

...
>>> # Convert traced model to CoreML
>>> text_input_shape = ct.Shape(shape=(1,
...                               ct.RangeDim(lower_bound=2, upper_bound=77, default=77)))
>>>
>>> model_coreml = ct.convert(
...     model_traced,
...     inputs=[ct.TensorType(name="input_text_token_ids", shape=text_input_shape, dtype=np.int64)],
...     outputs=[ct.TensorType(name="output_embedding", dtype=np.float16)],
...     minimum_deployment_target=ct.target.macOS13,
...     convert_to='mlprogram'
... )
Converting PyTorch Frontend ==> MIL Ops:   0% 0/1510 [00:00<?, ? ops/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Converting PyTorch Frontend ==> MIL Ops:  96% 1448/1510 [00:00<00:00, 1616.88 ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100% 1509/1510 [00:00<00:00, 1730.12 ops/s]
Running MIL frontend_pytorch pipeline: 100% 5/5 [00:00<00:00, 120.13 passes/s]
Running MIL default pipeline: 100% 66/66 [00:19<00:00,  3.44 passes/s]
Running MIL backend_mlprogram pipeline: 100% 11/11 [00:00<00:00, 228.89 passes/s]
Process 42442 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGKILL
    frame #0: 0x000000019a2663b0 libsystem_platform.dylib`__bzero + 64
libsystem_platform.dylib`:
->  0x19a2663b0 <+64>: dc     zva, x3
    0x19a2663b4 <+68>: add    x3, x3, #0x40
    0x19a2663b8 <+72>: subs   x2, x2, #0x40
    0x19a2663bc <+76>: b.hi   0x19a2663b0               ; <+64>
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGKILL
  * frame #0: 0x000000019a2663b0 libsystem_platform.dylib`__bzero + 64
    frame #1: 0x00000001b0e2d588 Espresso`std::__1::__shared_ptr_emplace<Espresso::blob<float, 4>, std::__1::allocator<Espresso::blob<float, 4> > >::__shared_ptr_emplace[abi:v15006]<int&, int&, int&, int&>(std::__1::allocator<Espresso::blob<float, 4> >, int&, int&, int&, int&) + 156
    frame #2: 0x00000001b0e2d4bc Espresso`std::__1::shared_ptr<Espresso::blob<float, 4> > std::__1::allocate_shared[abi:v15006]<Espresso::blob<float, 4>, std::__1::allocator<Espresso::blob<float, 4> >, int&, int&, int&, int&, void>(std::__1::allocator<Espresso::blob<float, 4> > const&, int&, int&, int&, int&) + 76
    frame #3: 0x00000001b1268eb4 Espresso`Espresso::blob_cpu::resize(Espresso::layer_shape const&, std::__1::shared_ptr<Espresso::abstract_blob_container_options>) + 1108
    frame #4: 0x00000001b0ecde48 Espresso`Espresso::allocate_blobs(std::__1::unordered_map<std::__1::shared_ptr<Espresso::abstract_blob_container>, int, std::__1::hash<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::equal_to<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::allocator<std::__1::pair<std::__1::shared_ptr<Espresso::abstract_blob_container> const, int> > > const&, std::__1::unordered_map<std::__1::shared_ptr<Espresso::abstract_blob_container>, unsigned long, std::__1::hash<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::equal_to<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::allocator<std::__1::pair<std::__1::shared_ptr<Espresso::abstract_blob_container> const, unsigned long> > >&, std::__1::unordered_map<std::__1::shared_ptr<Espresso::abstract_blob_container>, Espresso::layer_shape, std::__1::hash<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::equal_to<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::allocator<std::__1::pair<std::__1::shared_ptr<Espresso::abstract_blob_container> const, Espresso::layer_shape> > >&, int) + 732
    frame #5: 0x00000001b0ec8c60 Espresso`Espresso::reshape_networks_graph_coloring_raw_ptr_only_in_context(std::__1::shared_ptr<Espresso::abstract_context> const&, std::__1::vector<Espresso::net*, std::__1::allocator<Espresso::net*> > const&, int) + 2180
    frame #6: 0x00000001b0ec8340 Espresso`Espresso::reshape_networks_graph_coloring_raw_ptr(std::__1::vector<Espresso::net*, std::__1::allocator<Espresso::net*> >, int) + 640
    frame #7: 0x00000001b0ec6750 Espresso`Espresso::pass_graph_coloring::run_on_network(Espresso::net&) + 272
    frame #8: 0x00000001b0ffcbc4 Espresso`Espresso::shape_network_recursive(Espresso::net*, Espresso::network_shape const&, int, bool) + 6952
    frame #9: 0x00000001b107bd48 Espresso`Espresso::load_and_shape_network(std::__1::shared_ptr<Espresso::SerDes::generic_serdes_object> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<Espresso::abstract_context> const&, Espresso::network_shape const&, Espresso::compute_path, std::__1::shared_ptr<Espresso::blob_storage_abstract> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 644
    frame #10: 0x00000001b128af7c Espresso`Espresso::reload_network_on_context(std::__1::shared_ptr<Espresso::net> const&, std::__1::shared_ptr<Espresso::abstract_context> const&, Espresso::compute_path) + 452
    frame #11: 0x00000001b107d750 Espresso`Espresso::load_network(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<Espresso::abstract_context> const&, Espresso::compute_path, bool) + 1504
    frame #12: 0x00000001b0f1e7bc Espresso`EspressoLight::espresso_plan::add_network(char const*, espresso_storage_type_t, std::__1::shared_ptr<Espresso::net>) + 3108
    frame #13: 0x00000001b0f31918 Espresso`EspressoLight::espresso_plan::add_network(char const*, espresso_storage_type_t) + 64
    frame #14: 0x00000001b0f34c18 Espresso`espresso_plan_add_network + 416
    frame #15: 0x00000001a26cdf68 CoreML`-[MLNeuralNetworkEngine _addNetworkToPlan:error:] + 232
    frame #16: 0x00000001a26ccee4 CoreML`-[MLNeuralNetworkEngine _setupContextAndPlanWithConfiguration:usingCPU:reshapeWithContainer:error:] + 896
    frame #17: 0x00000001a26ce808 CoreML`-[MLNeuralNetworkEngine initWithContainer:configuration:error:] + 200
    frame #18: 0x00000001a273704c CoreML`-[MLMultiFunctionProgramEngine initWithProgramContainer:configuration:error:] + 312
    frame #19: 0x00000001a2737248 CoreML`+[MLMultiFunctionProgramEngine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] + 180
    frame #20: 0x00000001a272e360 CoreML`+[MLLoader loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] + 140
    frame #21: 0x00000001a272c9ac CoreML`+[MLLoader loadModelFromArchive:configuration:loaderEvent:useUpdatableModelLoaders:error:] + 1952
    frame #22: 0x00000001a272da14 CoreML`+[MLLoader loadModelFromArchive:configuration:loaderEvent:error:] + 24
    frame #23: 0x00000001a272ed60 CoreML`+[MLLoader loadModelFromAssetAtURL:configuration:loaderEvent:error:] + 252
    frame #24: 0x00000001a272efa0 CoreML`+[MLLoader loadModelFromAssetAtURL:configuration:error:] + 112
    frame #25: 0x00000001a2711c44 CoreML`-[MLModelAsset load:] + 496
    frame #26: 0x00000001a2711950 CoreML`-[MLModelAsset modelWithError:] + 60
    frame #27: 0x00000001a276c7e8 CoreML`+[MLModel modelWithContentsOfURL:configuration:error:] + 188
    frame #28: 0x00000002aaf96070 libcoremlpython.so`___lldb_unnamed_symbol353 + 692
    frame #29: 0x00000002aafaa5a8 libcoremlpython.so`___lldb_unnamed_symbol605 + 148
    frame #30: 0x00000002aafaa508 libcoremlpython.so`___lldb_unnamed_symbol604 + 24
    frame #31: 0x00000002aafa00e8 libcoremlpython.so`___lldb_unnamed_symbol490 + 4724
    frame #32: 0x00000001000acc88 python`cfunction_call + 80
    frame #33: 0x000000010005a294 python`_PyObject_MakeTpCall + 612
    frame #34: 0x000000010005db14 python`method_vectorcall + 620
    frame #35: 0x00000001000d157c python`slot_tp_init + 140
    frame #36: 0x00000001000c9e98 python`type_call + 340
    frame #37: 0x000000012f3c98cc _pywrap_cpu_feature_guard.so`pybind11_meta_call + 40
    frame #38: 0x000000010005a294 python`_PyObject_MakeTpCall + 612
    frame #39: 0x00000001001490f0 python`call_function + 676
    frame #40: 0x0000000100144e58 python`_PyEval_EvalFrameDefault + 26500
    frame #41: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #42: 0x0000000100149058 python`call_function + 524
    frame #43: 0x0000000100144ec8 python`_PyEval_EvalFrameDefault + 26612
    frame #44: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #45: 0x000000010005a4a4 python`_PyObject_FastCallDictTstate + 156
    frame #46: 0x000000010005b140 python`_PyObject_Call_Prepend + 164
    frame #47: 0x00000001000d1564 python`slot_tp_init + 116
    frame #48: 0x00000001000c9e98 python`type_call + 340
    frame #49: 0x000000010005a294 python`_PyObject_MakeTpCall + 612
    frame #50: 0x00000001001490f0 python`call_function + 676
    frame #51: 0x0000000100144ec8 python`_PyEval_EvalFrameDefault + 26612
    frame #52: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #53: 0x000000010005aad8 python`PyVectorcall_Call + 156
    frame #54: 0x0000000100145160 python`_PyEval_EvalFrameDefault + 27276
    frame #55: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #56: 0x0000000100149058 python`call_function + 524
    frame #57: 0x0000000100144ec8 python`_PyEval_EvalFrameDefault + 26612
    frame #58: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #59: 0x0000000100149058 python`call_function + 524
    frame #60: 0x0000000100144ec8 python`_PyEval_EvalFrameDefault + 26612
    frame #61: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #62: 0x0000000100198f98 python`run_mod + 216
    frame #63: 0x0000000100199600 python`PyRun_InteractiveOneObjectEx + 944
    frame #64: 0x0000000100198430 python`_PyRun_InteractiveLoopObject + 428
    frame #65: 0x0000000100197990 python`_PyRun_AnyFileObject + 112
    frame #66: 0x000000010019b754 python`PyRun_AnyFileExFlags + 184
    frame #67: 0x00000001001bcc20 python`Py_RunMain + 2736
    frame #68: 0x00000001001bdc50 python`pymain_main + 1272
    frame #69: 0x000000010000400c python`main + 56
    frame #70: 0x0000000199edff28 dyld`start + 2236

To Reproduce

from transformers import CLIPProcessor, CLIPModel
import torch

class WrappedCLIPModel_Text(CLIPModel):    
    def forward(self, *args, **kwargs):
        return self.get_text_features(*args, **kwargs)

model_version = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
processor = CLIPProcessor.from_pretrained(model_version)
model_pt_text = WrappedCLIPModel_Text.from_pretrained(model_version, return_dict=True)
model_pt_text.eval()

with torch.no_grad():
    processed_text = processor(text="example text", images=None, return_tensors="pt", padding=True)
    model_traced = torch.jit.trace(model_pt_text, processed_text.input_ids, strict=True)

import coremltools as ct
import numpy as np

# Convert traced model to CoreML
text_input_shape = ct.Shape(shape=(1,
                              ct.RangeDim(lower_bound=2, upper_bound=77, default=77))) 
#text_input_shape = ct.Shape(shape=(1,77)) # ← no crash with this
model_coreml = ct.convert(
    model_traced,
    inputs=[ct.TensorType(name="input_text_token_ids", shape=text_input_shape, dtype=np.int64)],
    outputs=[ct.TensorType(name="output_embedding", dtype=np.float16)],
    minimum_deployment_target=ct.target.macOS13,
    convert_to='mlprogram'
)

System environment (please complete the following information):

Additional context

If the input shape is fixed (text_input_shape = ct.Shape(shape=(1,77))), the conversion is successful.

TobyRoseman commented 1 year ago

model_traced(torch.Tensor([[49406, 4160]]).long()) works so an input shape of (1,2) should be valid.

The following works:

# Convert traced model to CoreML
text_input_shape = ct.Shape(shape=(1,
                              ct.RangeDim(lower_bound=2, upper_bound=77, default=77))) 

model_coreml = ct.convert(
    model_traced,
    inputs=[ct.TensorType(name="input_text_token_ids", shape=text_input_shape, dtype=np.int64)],
    outputs=[ct.TensorType(name="output_embedding")],
    convert_to="neuralnetwork"
)

Since we can convert to the neuralnetwork backend, this looks like an issue with the Core ML Framework rather than the conversion process. In which case the correct place to report this issue is using the Feedback Assistant.