Crash converting the text encoder component of CLIP-H

🐞Describing the bug

When converting the text encoder component of LAION's CLIP-H model to CoreML using a variable input shape, ct.convert crashes Python.

Stack Trace

...
>>> # Convert traced model to CoreML
>>> text_input_shape = ct.Shape(shape=(1,
...                               ct.RangeDim(lower_bound=2, upper_bound=77, default=77)))
>>>
>>> model_coreml = ct.convert(
...     model_traced,
...     inputs=[ct.TensorType(name="input_text_token_ids", shape=text_input_shape, dtype=np.int64)],
...     outputs=[ct.TensorType(name="output_embedding", dtype=np.float16)],
...     minimum_deployment_target=ct.target.macOS13,
...     convert_to='mlprogram'
... )
Converting PyTorch Frontend ==> MIL Ops:   0% 0/1510 [00:00<?, ? ops/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Converting PyTorch Frontend ==> MIL Ops:  96% 1448/1510 [00:00<00:00, 1616.88 ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100% 1509/1510 [00:00<00:00, 1730.12 ops/s]
Running MIL frontend_pytorch pipeline: 100% 5/5 [00:00<00:00, 120.13 passes/s]
Running MIL default pipeline: 100% 66/66 [00:19<00:00,  3.44 passes/s]
Running MIL backend_mlprogram pipeline: 100% 11/11 [00:00<00:00, 228.89 passes/s]
Process 42442 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGKILL
    frame #0: 0x000000019a2663b0 libsystem_platform.dylib`__bzero + 64
libsystem_platform.dylib`:
->  0x19a2663b0 <+64>: dc     zva, x3
    0x19a2663b4 <+68>: add    x3, x3, #0x40
    0x19a2663b8 <+72>: subs   x2, x2, #0x40
    0x19a2663bc <+76>: b.hi   0x19a2663b0               ; <+64>
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGKILL
  * frame #0: 0x000000019a2663b0 libsystem_platform.dylib`__bzero + 64
    frame #1: 0x00000001b0e2d588 Espresso`std::__1::__shared_ptr_emplace<Espresso::blob<float, 4>, std::__1::allocator<Espresso::blob<float, 4> > >::__shared_ptr_emplace[abi:v15006]<int&, int&, int&, int&>(std::__1::allocator<Espresso::blob<float, 4> >, int&, int&, int&, int&) + 156
    frame #2: 0x00000001b0e2d4bc Espresso`std::__1::shared_ptr<Espresso::blob<float, 4> > std::__1::allocate_shared[abi:v15006]<Espresso::blob<float, 4>, std::__1::allocator<Espresso::blob<float, 4> >, int&, int&, int&, int&, void>(std::__1::allocator<Espresso::blob<float, 4> > const&, int&, int&, int&, int&) + 76
    frame #3: 0x00000001b1268eb4 Espresso`Espresso::blob_cpu::resize(Espresso::layer_shape const&, std::__1::shared_ptr<Espresso::abstract_blob_container_options>) + 1108
    frame #4: 0x00000001b0ecde48 Espresso`Espresso::allocate_blobs(std::__1::unordered_map<std::__1::shared_ptr<Espresso::abstract_blob_container>, int, std::__1::hash<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::equal_to<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::allocator<std::__1::pair<std::__1::shared_ptr<Espresso::abstract_blob_container> const, int> > > const&, std::__1::unordered_map<std::__1::shared_ptr<Espresso::abstract_blob_container>, unsigned long, std::__1::hash<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::equal_to<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::allocator<std::__1::pair<std::__1::shared_ptr<Espresso::abstract_blob_container> const, unsigned long> > >&, std::__1::unordered_map<std::__1::shared_ptr<Espresso::abstract_blob_container>, Espresso::layer_shape, std::__1::hash<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::equal_to<std::__1::shared_ptr<Espresso::abstract_blob_container> >, std::__1::allocator<std::__1::pair<std::__1::shared_ptr<Espresso::abstract_blob_container> const, Espresso::layer_shape> > >&, int) + 732
    frame #5: 0x00000001b0ec8c60 Espresso`Espresso::reshape_networks_graph_coloring_raw_ptr_only_in_context(std::__1::shared_ptr<Espresso::abstract_context> const&, std::__1::vector<Espresso::net*, std::__1::allocator<Espresso::net*> > const&, int) + 2180
    frame #6: 0x00000001b0ec8340 Espresso`Espresso::reshape_networks_graph_coloring_raw_ptr(std::__1::vector<Espresso::net*, std::__1::allocator<Espresso::net*> >, int) + 640
    frame #7: 0x00000001b0ec6750 Espresso`Espresso::pass_graph_coloring::run_on_network(Espresso::net&) + 272
    frame #8: 0x00000001b0ffcbc4 Espresso`Espresso::shape_network_recursive(Espresso::net*, Espresso::network_shape const&, int, bool) + 6952
    frame #9: 0x00000001b107bd48 Espresso`Espresso::load_and_shape_network(std::__1::shared_ptr<Espresso::SerDes::generic_serdes_object> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<Espresso::abstract_context> const&, Espresso::network_shape const&, Espresso::compute_path, std::__1::shared_ptr<Espresso::blob_storage_abstract> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 644
    frame #10: 0x00000001b128af7c Espresso`Espresso::reload_network_on_context(std::__1::shared_ptr<Espresso::net> const&, std::__1::shared_ptr<Espresso::abstract_context> const&, Espresso::compute_path) + 452
    frame #11: 0x00000001b107d750 Espresso`Espresso::load_network(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<Espresso::abstract_context> const&, Espresso::compute_path, bool) + 1504
    frame #12: 0x00000001b0f1e7bc Espresso`EspressoLight::espresso_plan::add_network(char const*, espresso_storage_type_t, std::__1::shared_ptr<Espresso::net>) + 3108
    frame #13: 0x00000001b0f31918 Espresso`EspressoLight::espresso_plan::add_network(char const*, espresso_storage_type_t) + 64
    frame #14: 0x00000001b0f34c18 Espresso`espresso_plan_add_network + 416
    frame #15: 0x00000001a26cdf68 CoreML`-[MLNeuralNetworkEngine _addNetworkToPlan:error:] + 232
    frame #16: 0x00000001a26ccee4 CoreML`-[MLNeuralNetworkEngine _setupContextAndPlanWithConfiguration:usingCPU:reshapeWithContainer:error:] + 896
    frame #17: 0x00000001a26ce808 CoreML`-[MLNeuralNetworkEngine initWithContainer:configuration:error:] + 200
    frame #18: 0x00000001a273704c CoreML`-[MLMultiFunctionProgramEngine initWithProgramContainer:configuration:error:] + 312
    frame #19: 0x00000001a2737248 CoreML`+[MLMultiFunctionProgramEngine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] + 180
    frame #20: 0x00000001a272e360 CoreML`+[MLLoader loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] + 140
    frame #21: 0x00000001a272c9ac CoreML`+[MLLoader loadModelFromArchive:configuration:loaderEvent:useUpdatableModelLoaders:error:] + 1952
    frame #22: 0x00000001a272da14 CoreML`+[MLLoader loadModelFromArchive:configuration:loaderEvent:error:] + 24
    frame #23: 0x00000001a272ed60 CoreML`+[MLLoader loadModelFromAssetAtURL:configuration:loaderEvent:error:] + 252
    frame #24: 0x00000001a272efa0 CoreML`+[MLLoader loadModelFromAssetAtURL:configuration:error:] + 112
    frame #25: 0x00000001a2711c44 CoreML`-[MLModelAsset load:] + 496
    frame #26: 0x00000001a2711950 CoreML`-[MLModelAsset modelWithError:] + 60
    frame #27: 0x00000001a276c7e8 CoreML`+[MLModel modelWithContentsOfURL:configuration:error:] + 188
    frame #28: 0x00000002aaf96070 libcoremlpython.so`___lldb_unnamed_symbol353 + 692
    frame #29: 0x00000002aafaa5a8 libcoremlpython.so`___lldb_unnamed_symbol605 + 148
    frame #30: 0x00000002aafaa508 libcoremlpython.so`___lldb_unnamed_symbol604 + 24
    frame #31: 0x00000002aafa00e8 libcoremlpython.so`___lldb_unnamed_symbol490 + 4724
    frame #32: 0x00000001000acc88 python`cfunction_call + 80
    frame #33: 0x000000010005a294 python`_PyObject_MakeTpCall + 612
    frame #34: 0x000000010005db14 python`method_vectorcall + 620
    frame #35: 0x00000001000d157c python`slot_tp_init + 140
    frame #36: 0x00000001000c9e98 python`type_call + 340
    frame #37: 0x000000012f3c98cc _pywrap_cpu_feature_guard.so`pybind11_meta_call + 40
    frame #38: 0x000000010005a294 python`_PyObject_MakeTpCall + 612
    frame #39: 0x00000001001490f0 python`call_function + 676
    frame #40: 0x0000000100144e58 python`_PyEval_EvalFrameDefault + 26500
    frame #41: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #42: 0x0000000100149058 python`call_function + 524
    frame #43: 0x0000000100144ec8 python`_PyEval_EvalFrameDefault + 26612
    frame #44: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #45: 0x000000010005a4a4 python`_PyObject_FastCallDictTstate + 156
    frame #46: 0x000000010005b140 python`_PyObject_Call_Prepend + 164
    frame #47: 0x00000001000d1564 python`slot_tp_init + 116
    frame #48: 0x00000001000c9e98 python`type_call + 340
    frame #49: 0x000000010005a294 python`_PyObject_MakeTpCall + 612
    frame #50: 0x00000001001490f0 python`call_function + 676
    frame #51: 0x0000000100144ec8 python`_PyEval_EvalFrameDefault + 26612
    frame #52: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #53: 0x000000010005aad8 python`PyVectorcall_Call + 156
    frame #54: 0x0000000100145160 python`_PyEval_EvalFrameDefault + 27276
    frame #55: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #56: 0x0000000100149058 python`call_function + 524
    frame #57: 0x0000000100144ec8 python`_PyEval_EvalFrameDefault + 26612
    frame #58: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #59: 0x0000000100149058 python`call_function + 524
    frame #60: 0x0000000100144ec8 python`_PyEval_EvalFrameDefault + 26612
    frame #61: 0x000000010013ddc8 python`_PyEval_Vector + 2056
    frame #62: 0x0000000100198f98 python`run_mod + 216
    frame #63: 0x0000000100199600 python`PyRun_InteractiveOneObjectEx + 944
    frame #64: 0x0000000100198430 python`_PyRun_InteractiveLoopObject + 428
    frame #65: 0x0000000100197990 python`_PyRun_AnyFileObject + 112
    frame #66: 0x000000010019b754 python`PyRun_AnyFileExFlags + 184
    frame #67: 0x00000001001bcc20 python`Py_RunMain + 2736
    frame #68: 0x00000001001bdc50 python`pymain_main + 1272
    frame #69: 0x000000010000400c python`main + 56
    frame #70: 0x0000000199edff28 dyld`start + 2236

To Reproduce

from transformers import CLIPProcessor, CLIPModel
import torch

class WrappedCLIPModel_Text(CLIPModel):    
    def forward(self, *args, **kwargs):
        return self.get_text_features(*args, **kwargs)

model_version = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
processor = CLIPProcessor.from_pretrained(model_version)
model_pt_text = WrappedCLIPModel_Text.from_pretrained(model_version, return_dict=True)
model_pt_text.eval()

with torch.no_grad():
    processed_text = processor(text="example text", images=None, return_tensors="pt", padding=True)
    model_traced = torch.jit.trace(model_pt_text, processed_text.input_ids, strict=True)

import coremltools as ct
import numpy as np

# Convert traced model to CoreML
text_input_shape = ct.Shape(shape=(1,
                              ct.RangeDim(lower_bound=2, upper_bound=77, default=77))) 
#text_input_shape = ct.Shape(shape=(1,77)) # ← no crash with this
model_coreml = ct.convert(
    model_traced,
    inputs=[ct.TensorType(name="input_text_token_ids", shape=text_input_shape, dtype=np.int64)],
    outputs=[ct.TensorType(name="output_embedding", dtype=np.float16)],
    minimum_deployment_target=ct.target.macOS13,
    convert_to='mlprogram'
)

System environment (please complete the following information):

coremltools version: 7.0b2
OS (e.g. MacOS version or Linux type): macOS 13.3.1 (a)
Any other relevant version information (e.g. PyTorch or TensorFlow version): torch==2.0.0 (also tried 2.1.0 dev)

Additional context

If the input shape is fixed (text_input_shape = ct.Shape(shape=(1,77))), the conversion is successful.

apple / coremltools

Crash converting the text encoder component of CLIP-H #1960

🐞Describing the bug

Stack Trace

To Reproduce

System environment (please complete the following information):

Additional context