apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.32k stars 627 forks source link

Please add support for `torch.tensor_split` #2226

Open mallman opened 3 months ago

mallman commented 3 months ago

This layer/op is used by EVA-02, a model for image classification, segmentation and object detection. Personally, I'm interested in using it for image classification in a Mac app.

As of this writing (May 21st, 2024), various sizes of pre-trained EVA and EVA-02 models dominate the leaderboard for image classification on ImageNet 1k among the models curated by the Pytorch Image Models Hugging Face org. See https://huggingface.co/collections/timm/timm-top-20-imagenet-1k-models-655d78909af37bae32381f61

FYI, it looks like this is (essentially) the same op as tf.split from TensorFlow.

mallman commented 3 months ago

Oh, and here's an example of a failing conversion. This is from a script I've written for converting timm models:

import coremltools as ct
import timm
import torch

model_name = "eva02_tiny_patch14_224.mim_in22k"
print(f"Creating model {model_name}")
timm_model = timm.create_model(
  model_name,
  pretrained=True,
  scriptable=False,
  exportable=True)

model = torch.nn.Sequential(
  timm_model,
  torch.nn.Softmax(1)
).eval()

input_size = timm_model.default_cfg.get("input_size")
input_shape = (1,) + input_size

print("Tracing model")
example_input = torch.randn(input_shape)
jit_model = torch.jit.trace(model, example_input)

labels_filename = "imagenet21k_wordnet_lemmas.txt"

with open(labels_filename, "r") as labels_file:
  labels = [line.strip() for line in labels_file.readlines()]

classifier_config = ct.ClassifierConfig(labels)

print("Converting model")
# Scale and bias calculations taken from Core ML Tools documentation on
# preprocessing for PyTorch
mean = list(timm_model.default_cfg.get("mean"))
std = list(timm_model.default_cfg.get("std"))
import statistics
mean_std = statistics.mean(std)
scale = 1 / (mean_std * 255)
bias = [-m / s for m, s in zip(mean, std)]
input_type = ct.ImageType(
      name="image",
      shape=input_shape,
      scale=scale,
      bias=bias)

coreml_model = ct.convert(
  jit_model,
  convert_to="mlprogram",
  inputs=[input_type],
  classifier_config=classifier_config,
  skip_model_load=True
)

coreml_model.user_defined_metadata["com.apple.coreml.model.preview.type"] = "imageClassifier"

coreml_model_file_name = f"{model_name}.mlpackage"
print(f"Saving model to {coreml_model_file_name}")

coreml_model.save(coreml_model_file_name)
print("Done!")

I believe a pip install with the timm, torch and coremltools packages will give you the right environment for running this.

You will also need a labels file, imagenet21k_wordnet_lemmas.txt, in your working directory. I'm attaching that file. imagenet21k_wordnet_lemmas.txt

TobyRoseman commented 3 months ago

Here is a more concise way to reproduce the issue:

import torch
import coremltools as ct

class M(torch.nn.Module):
    def forward(self, x):
        return torch.tensor_split(x, 3)

x = torch.arange(8)
traced_model = torch.jit.trace(M(), x)
ct.convert(traced_model, inputs=[ct.TensorType(shape=x.shape)])

I think we should be able to use the split MIL ops at least for simple cases.

teelrabbit commented 2 months ago

Looks like this can be worked around by not just using torch.split but also using torch.unbind as shown here

An example of this being implemented can be seen below or in this paste (https://pastes.dev/kkaPViedJ7)

import torch
import coremltools as ct

class M(torch.nn.Module):
    def forward(self, x):
        splits = torch.split(x, x.size(0) // 3)
        return torch.unbind(torch.stack(splits))

x = torch.arange(9)  
traced_model = torch.jit.trace(M(), x)
ct.convert(traced_model, inputs=[ct.TensorType(shape=x.shape)])