Open 0seba opened 3 months ago
Hi @0seba, there appears to be some wrong deduplication going on in Core ML framework... So yes it is correct to file issue on Apple forum
Concretely, I tried your reproduce, and it errors out indeed. Repeating enumerated shapes with different orders does not help. Only reverting enumerated shapes works (i.e. only one of 16 x 256 or 16 x 512 works...)
Core ML framework has got back and fount the issue to be in coremltools: the protobuf is deduplicated.
Thanks @YifanShenSZ , hopefully it is a simple issue and we can get support for these models soon 😅
This turns out to be more involved. Some progress has been made: We have 2 fixes for our protobuf. Can continue to investigate this issue once those protobuf fixes land
🐞Describing the bug
I have a program with 3 inputs, all have a flexible enumerated shape, two share the same shape. The input shapes are:
shapes_1 = [(1, 3, 64, 16), (1, 3, 64, 32)] # last dimension shape varies, called Length1
shapes_2 = [(1, 3, 64, 256), (1, 3, 64, 512)] # last dimension shape varies, called Length2
After building, the model runs correctly with Length1=16 and Length2=256, or Length1=32 and Length2=512, but when I try to run with
Length1=32
andLength2=256
it fails.I tried extending the enumerated shapes, to match different combinations (setting
shape_1=[(1, 3, 64, 16), (1, 3, 64, 32), (1, 3, 64, 32), (1, 3, 64, 16)])
and analogous forshape_2
), but it does not work.Stack Trace
Running on Python trace
To Reproduce
A = [16, 32] B = [256, 512]
A, B = np.repeat(A, len(B)), B * len(A)
q_seqlens = A kv_seqlens = B input_ids_shapes = [(1, 3, 64, seqlen) for seqlen in q_seqlens] kv_shapes = [(1, 3, 64, seqlen) for seqlen in kv_seqlens]
input_ids_shape_def = mil.input_types.EnumeratedShapes(shapes=input_ids_shapes) kv_shape_def = mil.input_types.EnumeratedShapes(shapes=kv_shapes)
@mb.program( input_specs=[ mb.TensorSpec(input_ids_shape_def.symbolic_shape, mil.input_types.types.fp16), mb.TensorSpec(kv_shape_def.symbolic_shape, mil.input_types.types.fp16), mb.TensorSpec(kv_shape_def.symbolic_shape, mil.input_types.types.fp16), ], opset_version=mil.builder.AvailableTarget.iOS18 ) def prog( query, key_cache, value_cache, ): scores = mb.matmul(x=query, y=key_cache, transpose_x=True) scores = mb.mul(x=scores, y=np.float16(64 ** -0.5)) weights = mb.softmax(x=scores) attention = mb.matmul(x=value_cache, y=weights, transpose_y=True) return attention # , key_cache
cml_flex = ct.convert( prog, compute_units=ct.ComputeUnit.CPU_AND_NE, compute_precision=ct.precision.FLOAT16, minimum_deployment_target=ct.target.iOS18, inputs=[ ct.TensorType(name="query", shape=ct.EnumeratedShapes(input_ids_shapes)), ct.TensorType(name="key_cache", shape=ct.EnumeratedShapes(kv_shapes)), ct.TensorType(name="value_cache", shape=ct.EnumeratedShapes(kv_shapes)), ], )
QL = 16 CL = 512
np.random.seed(42) cml_flex.predict({ 'query': np.random.randn(1, 3, 64, QL).astype(np.float16), 'key_cache': np.random.randn(1, 3, 64, CL).astype(np.float16), 'value_cache': np.random.randn(1, 3, 64, CL).astype(np.float16), })