Documentation does not show iOS16 ops for conv and conv_transpose

skottmckay commented 8 months ago

🐞Describing the bug

I have an ML Program with a conv node. If the pad_type is set to same or same_lower I get an error saying the 'pad' input is required.

coremlcompiler: error: Failed to parse the model specification. Error: Unable to parse ML Program: in operation node1_token_0: Required param 'pad' is missing

The spec for conv in coremltools says 'pad' should only be specified if the pad_type is custom.

Is there some inconsistency between the spec and the parsing in coremlcompiler and 'pad' needs to be specified?

System environment (please complete the following information):

coremltools version:
OS (e.g. MacOS version or Linux type): MacOS 13.6.4

Additional context

Zipped model.mlprogram: model.zip

TobyRoseman commented 8 months ago

I can reproduce the issue using your attached mlmodel. However I'm unable to create a MIL program that reproduces these results.

It looks like you're using the iOS15 opset, but the following program:

import coremltools as ct
from coremltools.converters.mil.mil import Builder as mb
import numpy as np

target = ct.target.iOS15

x_shape = (1, 1, 5, 5)
w_shape = (1, 1, 3, 3)

@mb.program(
    input_specs=[
        mb.TensorSpec(shape=x_shape),
        mb.TensorSpec(shape=w_shape)
    ],
    opset_version=target
)
def prog(x, weight):
    return mb.conv(x=x, weight=weight, pad_type='same_lower')

m = ct.convert(prog, minimum_deployment_target=target)

x = np.random.rand(*x_shape)
w = np.random.rand(*w_shape)

print(m.predict({'x': x, 'weight': w}))

Produces the error

ValueError: iOS15 version of conv does not support pad_type = `same_lower`

in the ct.convert call.

@skottmckay - Please share the code you used to create your mlmodel. Also what version of coremltools are you using?

skottmckay commented 8 months ago

coremltools 7.1

I'm trying to dynamically create the CoreML model at runtime in onnxruntime so an ONNX model can benefit from CoreML if it's available on the device. To do that I'm using the coremltools implementation as the basis, including the info in the spec.

It seems like a 'pad' input is required for 'same'. In the model generated by your script, if I change the pad_type to 'same' there is a 'pad' input added as shown by viewing the mlmodel in Netron.

This code is creating the ML Program 'conv' pad input/s.

https://github.com/microsoft/onnxruntime/blob/f31376cc775b65cbee14c236400bd1854267008b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc#L87-L128

If I change that to add a 'pad' input for 'same' the compile of the generated coreml model is happy. It seems to be happy with an empty 'pad' input or one with zeros. However, the spec for iOS15 conv explicitly says

'same_lower' is allowed even though the coremltools python code rejects it

https://github.com/apple/coremltools/blob/dbb0094fd0cb936469e35320bf37e866ef7a1da4/coremltools/converters/mil/mil/ops/defs/iOS15/conv.py#L56-L57

and to not pass in 'pad' unless 'pad_type' is 'custom', which makes it unclear what should be passed in (empty pads, zero pads of the correct size, other) when the implementation seems to actually require a 'pad' input

https://github.com/apple/coremltools/blob/dbb0094fd0cb936469e35320bf37e866ef7a1da4/coremltools/converters/mil/mil/ops/defs/iOS15/conv.py#L72-L73

Is there any way to view the compilation implementation so when there are discrepancies between what the spec says and what compilation requires I can trawl through that to resolve it?

skottmckay commented 8 months ago

Interestingly I can create a model programmatically that uses 'same_lower' for 'pad_type' and default 'pad' values and it will compile and run successfully.

Naively that makes it seem like some parts of the spec are inconsistent in coremltools ('same_lower' not allowed) and some parts are inconsistent in the model parsing used during compilation ('pad', 'dilations' and 'strides' seem to be required at times when the spec says they should be optional).

TobyRoseman commented 8 months ago

It seems like a 'pad' input is required for 'same'.

That's not what I'm seeing. If you change my code: pad_type='same_lower' -> pad_type='same', it works without specifying pad.

In the model generated by your script, if I change the pad_type to 'same' there is a 'pad' input added as shown by viewing the mlmodel in Netron.

How are you changing the model? Editing the spec directly?

pad showing up in Netron may be an artifact of Netron; that's might not be what's actually going on.

Is there any way to view the compilation implementation so when there are discrepancies between what the spec says and what compilation requires I can trawl through that to resolve it?

Sorry, no this is not possible.

skottmckay commented 8 months ago

The 'pad' certainly could be an artifact of Netron, however dumping the model spec by adding the below python to your script shows a 'pad' input exists. If I save the model to an mlpackage and load it wil ct.models.MLModel() the 'pad' input is still present in that. The 'conv_0_pad_0' string also exists in the bytes of the model.mlmodel file. Given all that, it looks to me like a default 'pad' input is added.

spec = m.get_spec()
main = spec.mlProgram.functions["main"]
block = main.block_specializations[main.opset]
for op in block.operations:
    if op.type == 'const':
        if op.attributes["name"].immediateValue.tensor.strings.values[0] == "conv_0_pad_type_0":
            print(f"Conv pad_type={op.attributes['val'].immediateValue.tensor.strings.values}")
        if op.attributes["name"].immediateValue.tensor.strings.values[0] == "conv_0_pad_0":
            print(f"Conv pad values={op.attributes['val'].immediateValue.tensor.ints.values}")
    if op.type == 'conv':
        print(op)

produces

Conv pad_type=['same']
Conv pad values=[0, 0, 0, 0]
type: "conv"
inputs {
  key: "dilations"
  value {
    arguments {
      name: "conv_0_dilations_0"
    }
  }
}
inputs {
  key: "groups"
  value {
    arguments {
      name: "conv_0_groups_0"
    }
  }
}
inputs {
  key: "pad"
  value {
    arguments {
      name: "conv_0_pad_0"
    }
  }
}
inputs {
  key: "pad_type"
  value {
    arguments {
      name: "conv_0_pad_type_0"
    }
  }
}
...

I'm converting an ONNX model to a CoreML model at runtime using the coremltools protobuf definitions in mlmodel/format to create the model in C++. I'm using the spec for input names/types/required/optional info, and the coremltools python implementation as an implementation example.

In this case, if I ignore what the spec says (pad should be specified if and only if pad_type == custom, otherwise errors occur.) and replicate what coremltools appears to do and add the 'pad' input when the pad_type is 'same' the model happily compiles and runs.

If I change your script to target = ct.target.iOS16, 'same_lower' is accepted. The spec says that value is present in iOS15.

Based on that, should the spec be considered a rough suggestion of what may or may not work, and the model that coremltools produces to be what actually works and the source of truth?

TobyRoseman commented 8 months ago

I've reviewed some internal ML Program configuration files. There are separate iOS15 and iOS16 ops for both conv and conv_transpose. The only difference between these versions seems to be that the iOS15 ones do not accept pad_type=same_lower while the iOS16 do accept that.

Our documentation should show that there are iOS16 ops for both conv and conv_transpose. We should also remove same_lower from allowed values for pad_type in the iOS15 conv op documentation. Let's use this GitHub issue to track that; I'll update the title of this issue.

I know we have discussed other related possible issues. Please open new GitHub issues for that. Ideally, please use code similar to what I have shared previous to demonstrate the problem.

skottmckay commented 8 months ago

Awesome. Thanks for the update.

Is there a way to replicate things using the python builder code for the scenario where the coremltools python implicitly adds an input like the 'pad' for conv? Not sure how I can show the issue with the spec saying specific inputs are optional but they appear to be required when that happens.

TobyRoseman commented 8 months ago

I'm not sure I understand your question. You should be able to use the Builder API directly.

skottmckay commented 8 months ago

This output was from adding the dumping logic to your builder based script. There's no 'pad' specified in the script, but it gets added to the model generated. I don't know how to demonstrate an issue where an 'optional' input causes an error about being 'required' with the builder api when it's adding inputs in the background (pads/strides/dilations in this case).

Is this code responsible for providing default values for all 'optional' inputs?

https://github.com/apple/coremltools/blob/c19283b4f00f36da3807644ff30957d386c75227/coremltools/converters/mil/mil/builder.py#L170-L171

https://github.com/apple/coremltools/blob/c19283b4f00f36da3807644ff30957d386c75227/coremltools/converters/mil/mil/ops/defs/iOS15/conv.py#L141-L150

apple / coremltools