apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.42k stars 639 forks source link

Error on mixing shapes of multiple enumerated shape inputs #2271

Open 0seba opened 3 months ago

0seba commented 3 months ago

🐞Describing the bug

I have a program with 3 inputs, all have a flexible enumerated shape, two share the same shape. The input shapes are:

After building, the model runs correctly with Length1=16 and Length2=256, or Length1=32 and Length2=512, but when I try to run with Length1=32 and Length2=256 it fails.

I tried extending the enumerated shapes, to match different combinations (setting shape_1=[(1, 3, 64, 16), (1, 3, 64, 32), (1, 3, 64, 32), (1, 3, 64, 16)]) and analogous for shape_2), but it does not work.

Stack Trace

Running on Python trace

File ~/.pyenv/versions/apple/lib/python3.11/site-packages/coremltools/models/model.py:654, in MLModel.predict(self, data, state)
    651 MLModel._check_predict_data(data)
    653 if self.__proxy__:
--> 654     return self._get_predictions(self.__proxy__,
    655                                  verify_and_convert_input_dict,
    656                                  data,
    657                                  state)
    658 else:   # Error case
    659     if _macos_version() < (10, 13):

File ~/.pyenv/versions/apple/lib/python3.11/site-packages/coremltools/models/model.py:702, in MLModel._get_predictions(proxy, preprocess_method, data, state)
    700     preprocess_method(data)
    701     state = None if state is None else state.__proxy__
--> 702     return proxy.predict(data, state)
    703 else:
    704     assert type(data) == list

RuntimeError: Caught an unknown exception!

To Reproduce

A = [16, 32] B = [256, 512]

A, B = np.repeat(A, len(B)), B * len(A)

q_seqlens = A kv_seqlens = B input_ids_shapes = [(1, 3, 64, seqlen) for seqlen in q_seqlens] kv_shapes = [(1, 3, 64, seqlen) for seqlen in kv_seqlens]

input_ids_shape_def = mil.input_types.EnumeratedShapes(shapes=input_ids_shapes) kv_shape_def = mil.input_types.EnumeratedShapes(shapes=kv_shapes)

@mb.program( input_specs=[ mb.TensorSpec(input_ids_shape_def.symbolic_shape, mil.input_types.types.fp16), mb.TensorSpec(kv_shape_def.symbolic_shape, mil.input_types.types.fp16), mb.TensorSpec(kv_shape_def.symbolic_shape, mil.input_types.types.fp16), ], opset_version=mil.builder.AvailableTarget.iOS18 ) def prog( query, key_cache, value_cache, ): scores = mb.matmul(x=query, y=key_cache, transpose_x=True) scores = mb.mul(x=scores, y=np.float16(64 ** -0.5)) weights = mb.softmax(x=scores) attention = mb.matmul(x=value_cache, y=weights, transpose_y=True) return attention # , key_cache

cml_flex = ct.convert( prog, compute_units=ct.ComputeUnit.CPU_AND_NE, compute_precision=ct.precision.FLOAT16, minimum_deployment_target=ct.target.iOS18, inputs=[ ct.TensorType(name="query", shape=ct.EnumeratedShapes(input_ids_shapes)), ct.TensorType(name="key_cache", shape=ct.EnumeratedShapes(kv_shapes)), ct.TensorType(name="value_cache", shape=ct.EnumeratedShapes(kv_shapes)), ], )

QL = 16 CL = 512

np.random.seed(42) cml_flex.predict({ 'query': np.random.randn(1, 3, 64, QL).astype(np.float16), 'key_cache': np.random.randn(1, 3, 64, CL).astype(np.float16), 'value_cache': np.random.randn(1, 3, 64, CL).astype(np.float16), })


## System environment (please complete the following information):
 - coremltools version: 8.0b1
 - OS (e.g. MacOS version or Linux type): 15.0 beta 2

## Additional context

Also, when I remove the last comment in the program (`#, key_cache`), when running a prediction it does not raise an exception when running with different shapes, but all inputs all converted to `0`s (`query` is an array of only `0` and same of `key_cache` and `value_cache`). I'll report this issue in Apple Forums and Feedback Assistant. But I wasn't sure if the first part of the problem is just a conversion issue or intrinsic CoreML issue, thus why I reported here also.

Additional Swift trace
```swift
*** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'There is no function in the program library for the provided input=query = MultiArray : Float16 1 × 3 × 64 × 16 array
key_cache = MultiArray : Float16 1 × 3 × 64 × 512 array
value_cache = MultiArray : Float16 1 × 3 × 64 × 512 array
.'
*** First throw call stack:
(
    0   CoreFoundation                      0x00000001998f6920 __exceptionPreprocess + 176
    1   libobjc.A.dylib                     0x00000001993deb1c objc_exception_throw + 76
    2   CoreFoundation                      0x00000001998f6810 +[NSException exceptionWithName:reason:userInfo:] + 0
    3   CoreML                              0x00000001a3745a98 -[MLE5EnumeratedShapeExecutionStreamOperationPool takeOutOperationForFeatures:error:] + 480
    4   CoreML                              0x00000001a3845a78 -[MLE5ExecutionStream setupOperationForInputFeatures:operationPool:error:] + 92
    5   CoreML                              0x00000001a37f58d0 -[MLE5Engine _cleanUpAndReconfigureStream:forInputFeatures:error:] + 108
    6   CoreML                              0x00000001a37f4be8 -[MLE5Engine _predictionFromFeatures:options:completionHandler:] + 256
    7   CoreML                              0x00000001a37f511c -[MLE5Engine submitPredictionRequest:completionHandler:] + 124
    8   CoreML                              0x00000001a37cc780 __62-[MLDelegateModel _submitPredictionRequest:completionHandler:]_block_invoke + 420
    9   libdispatch.dylib                   0x00000001001b0b6c _dispatch_call_block_and_release + 32
    10  libdispatch.dylib                   0x00000001001b28ac _dispatch_client_callout + 20
    11  libdispatch.dylib                   0x00000001001b6110 _dispatch_continuation_pop + 700
    12  libdispatch.dylib                   0x00000001001b50ac _dispatch_async_redirect_invoke + 616
    13  libdispatch.dylib                   0x00000001001ca9b8 _dispatch_root_queue_drain + 404
    14  libdispatch.dylib                   0x00000001001cb5c4 _dispatch_worker_thread2 + 188
    15  libsystem_pthread.dylib             0x000000010024d0c4 _pthread_wqthread + 228
    16  libsystem_pthread.dylib             0x0000000100254cf0 start_wqthread + 8
)
libc++abi: terminating due to uncaught exception of type NSException
YifanShenSZ commented 3 months ago

Hi @0seba, there appears to be some wrong deduplication going on in Core ML framework... So yes it is correct to file issue on Apple forum

Concretely, I tried your reproduce, and it errors out indeed. Repeating enumerated shapes with different orders does not help. Only reverting enumerated shapes works (i.e. only one of 16 x 256 or 16 x 512 works...)

YifanShenSZ commented 3 months ago

Core ML framework has got back and fount the issue to be in coremltools: the protobuf is deduplicated.

0seba commented 3 months ago

Thanks @YifanShenSZ , hopefully it is a simple issue and we can get support for these models soon 😅

YifanShenSZ commented 2 months ago

This turns out to be more involved. Some progress has been made: We have 2 fixes for our protobuf. Can continue to investigate this issue once those protobuf fixes land