apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.33k stars 626 forks source link

Broadcasable tensor index support #1596

Closed Typiqally closed 2 years ago

Typiqally commented 2 years ago

When converting the Faster R-CNN RegNetX-3.2GF-FPN model from TorchScript to CoreML I get the following error:

Traceback (most recent call last):
  File "mmdeploy/tools/deploy.py", line 461, in <module>
    main()
  File "mmdeploy/tools/deploy.py", line 407, in main
    from_torchscript(torchscript_path, output_file_prefix,
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/backend/coreml/torchscript2coreml.py", line 106, in from_torchscript
    mlmodel = ct.convert(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/_converters_entry.py", line 451, in convert
    mlmodel = mil_convert(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 193, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 220, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 283, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 115, in __call__
    return load(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 53, in load
    return _perform_torch_convert(converter, debug)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 100, in _perform_torch_convert
    raise e
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 92, in _perform_torch_convert
    prog = converter.convert()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 269, in convert
    convert_nodes(self.context, self.graph)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
    add_op(context, node)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 3199, in index
    raise NotImplementedError("Broadcasable tensor index not supported.")
NotImplementedError: Broadcasable tensor index not supported.

Are there any plans to implement this in the near future? If so, in what kind of time frame is it expected?

TobyRoseman commented 2 years ago

Looking at the code, it's not clear to me what exactly is the issue.

@Typiqally - can you give us a minimal example that reproduces this issue?

Typiqally commented 2 years ago

I appreciate your prompt response, I thought this was simply a known issue that lacked support, which is the reason I did not initially send a snippet. I will provide the steps now:

I'm using MMDeploy, a utility to export models based on the OpenMMLab framework. This utility has an option to convert models from MMLab to CoreML. This option takes the following steps:

  1. The MMLab model is converted to TorchScript.
  2. The resulting TorchScript binary is converted to CoreML using the CoreML tools utility.

During this conversion, the aforementioned exception is thrown, which might be due to an incompatibility with the model. The MMDeploy tool internally calls the following function:

def from_torchscript(torchscript_model: Union[str,
                                              torch.jit.RecursiveScriptModule],
                     output_file_prefix: str,
                     input_names: Sequence[str],
                     output_names: Sequence[str],
                     input_shapes: Dict,
                     convert_to: str = 'neuralnetwork',
                     fp16_mode: bool = False,
                     skip_model_load: bool = True,
                     **kwargs):
    """Create a coreml engine from torchscript.
    Args:
        torchscript_model (Union[str, torch.jit.RecursiveScriptModule]):
            The torchscript model to be converted.
        output_file_prefix (str): The output file prefix.
        input_names (Sequence[str]): The input names of the model.
        output_names (Sequence[str]): The output names of the model.
        input_shapes (Dict): The input shapes include max_shape, min_shape and
            default_shape
        convert_to (str, optional): The converted model type, can be
            'neuralnetwork' or 'mlprogram'. Defaults to 'neuralnetwork'.
        fp16_mode (bool, optional): Convert to fp16 model. Defaults to False.
        skip_model_load (bool, optional): Skip model load. Defaults to True.
    """

    try:
        from mmdeploy.backend.torchscript import get_ops_path
        torch.ops.load_library(get_ops_path())
    except Exception as e:
        get_root_logger().warning(
            'Can not load custom ops because:\n'
            f'{e}\n'
            'Some model might not be able to be converted.')

    if isinstance(torchscript_model, str):
        torchscript_model = torch.jit.load(torchscript_model)

    inputs = []
    outputs = []

    for name in input_names:
        shape = create_shape(name, input_shapes[name])
        inputs.append(shape)

    for name in output_names:
        outputs.append(ct.TensorType(name=name))

    if convert_to == 'neuralnetwork':
        compute_precision = None
    else:
        if fp16_mode:
            compute_precision = ct.precision.FLOAT16
        else:
            compute_precision = ct.precision.FLOAT32

    mlmodel = ct.convert(
        model=torchscript_model,
        inputs=inputs, #In my case, [ImageType[name=input, shape=[1, 3, 608, 608], scale=0.00392156862745098, bias=[0, 0, 0], color_layout=ColorLayout.RGB, channel_first=None]]
        outputs=outputs, #In my case, [TensorType[name=dets, shape=None, dtype=None], TensorType[name=labels, shape=None, dtype=None]]
        compute_precision=compute_precision, #In my case, ComputePrecision.FLOAT32
        convert_to=convert_to, #In my case, mlprogram
        skip_model_load=False)

    suffix = get_model_suffix(convert_to)
    output_path = output_file_prefix + suffix
    mlmodel.save(output_path)

See https://github.com/open-mmlab/mmdeploy/blob/b87afb9ebb24fe46bf8ae4728a354bdcc1afee50/mmdeploy/backend/coreml/torchscript2coreml.py#L50-L116

In my case, I'm trying to convert the faster_rcnn_regnetx-3.2GF_fpn_1x_coco model to CoreML by using the following command:

python mmdeploy/tools/deploy.py \
    mmdeploy/configs/mmdet/detection/detection_coreml_static-800x1344.py \
    mmdetection/configs/regnet/faster_rcnn_regnetx-3.2GF_fpn_1x_coco.py \
    checkpoints/faster_rcnn_regnetx-3.2GF_fpn_1x_coco_20200517_175927-126fd9bf.pth \
    mmdetection/demo/demo.jpg \
    --work-dir work_dir/faster_rcnn_regnetx \
    --device cpu

I understand that this issue is mostly related to the MMDeploy utility and not CoreML; however, I believe the issue regarding the model not converting might be within this repository. I further debugged the issue and found that the operation where the issue occurs is called topk. When I remove the assertion for the multiple index axes case, I get the following stack trace, which introduces new information:

Traceback (most recent call last):
  File "mmdeploy/tools/deploy.py", line 461, in <module>
    main()
  File "mmdeploy/tools/deploy.py", line 407, in main
    from_torchscript(torchscript_path, output_file_prefix,
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/Users/typically/Workspace/vbti-plant-morphology/mmdeploy/mmdeploy/backend/coreml/torchscript2coreml.py", line 106, in from_torchscript
    mlmodel = ct.convert(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/_converters_entry.py", line 451, in convert
    mlmodel = mil_convert(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 193, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 220, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 283, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 115, in __call__
    return load(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 53, in load
    return _perform_torch_convert(converter, debug)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 92, in _perform_torch_convert
    prog = converter.convert()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 269, in convert
    convert_nodes(self.context, self.graph)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
    add_op(context, node)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 3203, in index
    indices = mb.stack(values=valid_indices, axis=indices_rank)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 172, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/mil/builder.py", line 191, in _add_op
    new_op.type_value_inference()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/mil/operation.py", line 241, in type_value_inference
    output_types = self.type_inference()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mmlabs/lib/python3.8/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/tensor_operation.py", line 1236, in type_inference
    raise ValueError(msg.format(t.name, t.shape, t_shape))
ValueError: Component tensor topk_inds0.1 has shape (1, 1000), others have (1, 1)

As I said before, I believe this is not currently supported, and I would appreciate if you could provide me with information about when support for this will be available.

Typiqally commented 2 years ago

Looking further into it, the previously mentioned model uses multi class non-max suppression, which might be causing the issue.

TobyRoseman commented 2 years ago

It's still not clear what specifically is the issue here. The next step to getting this resolved is having a simple, standalone example that reproduces the problem, i.e. something which could become a unit tests and doesn't require downloading an external model.

Typiqally commented 2 years ago

I'm sorry, but I believe the issue was actually with MMDeploy after all. Their two stage detector was broken after a merge causing the issue mentioned above, see https://github.com/open-mmlab/mmdeploy/issues/1038.

I'm still not completely certain as to why this happens, and sadly, due to their framework's integration, I'm unable to provide a sufficient code snippet without requiring external code or models.