PyTorch unified convertor does not work with flexible input shapes

3DTOPO commented 3 years ago

🐞Describe the bug

I made changes to my model so I could use the recommended unified convertor. Conversion is successful without issue and shows that flexible shapes are supported (in both Python and Xcode).

Running prediction with a shape in the supported ranges (and any shape other than the fixed shape) will fail with an error. The fixed shape input works as expected. I've tried both GPU and CPU only.

Trace

[espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Not implemented": axis -4 not implemented status=-9
[coreml] Failure dynamically resizing for sequence length.
[coreml] Failure in resetSizes.
prediction error: Error Domain=com.apple.CoreML Code=0 "Failure dynamically resizing for sequence length." UserInfo={NSLocalizedDescription=Failure dynamically resizing for sequence length.}

To Reproduce

The source code and model is in the attached archive.

import torch
import torch.nn as nn
import coremltools as ct
import coremltools.proto.FeatureTypes_pb2 as ft
from coremltools.models.neural_network import flexible_shape_utils
from model import TransformerNet

channels = 3
width = 1024
height = 1024

torch_model = TransformerNet()
#torch_model.load_state_dict(torch.load('TrainedModel.pth', map_location=torch.device('cpu')))
torch_model.eval()

example_input = torch.rand(1, channels, width, height)
traced_model = torch.jit.trace(torch_model, example_input)

mlmodel = ct.convert(
    traced_model,
    inputs=[ct.ImageType(name="input_1", shape=example_input.shape)], 
    minimum_ios_deployment_target='13'
)

#note if "input" is used for the name it creates a name collision

spec = mlmodel.get_spec()

# needed because documentation states:
# outputs must not be specified for PyTorch
output = spec.description.output[0]
output.type.imageType.colorSpace = ft.ImageFeatureType.RGB
output.type.imageType.height = height
output.type.imageType.width = width

ct.utils.rename_feature(spec, '782', 'output')

img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange(height_range=(256, 3072), width_range=(256, 3072))
flexible_shape_utils.update_image_size_range(spec, feature_name='input_1', size_range=img_size_ranges)
flexible_shape_utils.update_image_size_range(spec, feature_name='output', size_range=img_size_ranges)

ct.utils.save_spec(spec, "TransformerNet.mlmodel")

model.py:

import torch
import torch.nn as nn

class TransformerNet(torch.nn.Module):

    def __init__(self):
        super(TransformerNet, self).__init__()
        # Initial convolution layers
        self.conv1 = ConvLayer(3, 8, kernel_size=9, stride=1)
        self.in1 = torch.nn.InstanceNorm2d(8, affine=True)
        self.conv2 = ConvLayer(8, 16, kernel_size=3, stride=2)
        self.in2 = torch.nn.InstanceNorm2d(16, affine=True)
        self.conv3 = ConvLayer(16, 32, kernel_size=3, stride=2)
        self.in3 = torch.nn.InstanceNorm2d(32, affine=True)
        # Residual layers
        self.res1 = ResidualBlock(32)
        self.res2 = ResidualBlock(32)
        self.res3 = ResidualBlock(32)
        self.res4 = ResidualBlock(32)
        self.res5 = ResidualBlock(32)
        # Upsampling Layers
        self.deconv1 = UpsampleConvLayer(32, 16, kernel_size=3, stride=1, upsample=2)
        self.in4 = torch.nn.InstanceNorm2d(16, affine=True)
        self.deconv2 = UpsampleConvLayer(16, 8, kernel_size=3, stride=1, upsample=2)
        self.in5 = torch.nn.InstanceNorm2d(8, affine=True)
        self.deconv3 = ConvLayer(8, 3, kernel_size=9, stride=1)
        # Non-linearities
        self.relu = torch.nn.ReLU()

    def forward(self, X):
        y = self.relu(self.in1(self.conv1(X)))
        y = self.relu(self.in2(self.conv2(y)))
        y = self.relu(self.in3(self.conv3(y)))
        y = self.res1(y)
        y = self.res2(y)
        y = self.res3(y)
        y = self.res4(y)
        y = self.res5(y)
        y = self.relu(self.in4(self.deconv1(y)))
        y = self.relu(self.in5(self.deconv2(y)))
        y = self.deconv3(y)
        return y

class ConvLayer(torch.nn.Module):

    def __init__(self, in_channels, out_channels, kernel_size, stride):
        super(ConvLayer, self).__init__()
        reflection_padding = kernel_size // 2
        #self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding)
        self.reflection_pad = ReflectPad2d_rev(reflection_padding)
        self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride)

    def forward(self, x):
        out = self.reflection_pad(x)
        out = self.conv2d(out)
        return out

class ResidualBlock(torch.nn.Module):

    def __init__(self, channels):
        super(ResidualBlock, self).__init__()
        self.conv1 = ConvLayer(channels, channels, kernel_size=3, stride=1)
        self.in1 = torch.nn.InstanceNorm2d(channels, affine=True)
        self.conv2 = ConvLayer(channels, channels, kernel_size=3, stride=1)
        self.in2 = torch.nn.InstanceNorm2d(channels, affine=True)
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        residual = x
        out = self.relu(self.in1(self.conv1(x)))
        out = self.in2(self.conv2(out))
        out = out + residual
        return out

class UpsampleConvLayer(torch.nn.Module):

    def __init__(self, in_channels, out_channels, kernel_size, stride, upsample=None):
        super(UpsampleConvLayer, self).__init__()
        self.upsample = upsample
        reflection_padding = kernel_size // 2
        #self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding)
        self.reflection_pad = ReflectPad2d_rev(reflection_padding)
        self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride)

    def forward(self, x):
        x_in = x
        if self.upsample:
            x_in = torch.nn.functional.interpolate(x_in, mode='nearest', scale_factor=self.upsample)
        out = self.reflection_pad(x_in)
        out = self.conv2d(out)
        return out

class ReflectPad2d_rev(nn.Module):

    def __init__(self, size):
        super().__init__()
        self.size = size

    def forward(self, x):
        a = self.size
        L_list, R_list = [], []
        U_list, D_list = [], []
        for i in range(a):#i:0, 1
            l = x[:, :, :, (a-i):(a-i+1)]
            L_list.append(l)
            r = x[:, :, :, (i-a-1):(i-a)]
            R_list.append(r)
        L_list.append(x)
        x = torch.cat(L_list+R_list[::-1], dim=3)
        for i in range(a):
            u = x[:, :, (a-i):(a-i+1), :]
            U_list.append(u)
            d = x[:, :, (i-a-1):(i-a), :]
            D_list.append(d)
        U_list.append(x)
        x = torch.cat(U_list+D_list[::-1], dim=2)
        return x

System environment (please complete the following information):

coremltools 4.0
OS MacOS
macOS 10.15.7 (19H2)
Version 12.1 (12A7403)
virtualenv
python version 3.7
- pytorch 1.70

Additional context

This issue severely restricts deploying MLModels across my workflow.

repo-pythorch-conversion.zip

RahulBhalley commented 3 years ago

It's been a long time. Any progress on this?

RahulBhalley commented 3 years ago

I am getting the following error when convert the traced PyTorch model to CoreML with coremltools:

RuntimeError: PyTorch convert function for op 'reflection_pad2d' not implemented.

3DTOPO commented 3 years ago

I really can't believe this hasn't been addressed yet myself. It is one of the single most important pieces of my iOS development tool chain.

Any how, there is a workaround for reflection_pad2d, see https://github.com/apple/coremltools/issues/855

RahulBhalley commented 3 years ago

@3DTOPO to me looks like your weren't able to write MIL operator for Reflection Padding 2D, right?

3DTOPO commented 3 years ago

No, mushipand's solution works. Coremltools said they would add it and I was just bummed that it hasn't been added still.

RahulBhalley commented 3 years ago

@3DTOPO Ran your given code and got the following dims mismatch error:

ValueError: Dimension mismatch in concat ("x.28"): shapes [1, 32, 0, 0] vs. (1, 32, -128, -128)

Which PyTorch code works for converting the TransformerNet model to CoreML?

3DTOPO commented 3 years ago

I was able to get it working. Sorry I can't recall the details or I would share. Try asking mushipand since it is his solution.

3DTOPO commented 3 years ago

Looks like your conversion script isn't right.

RahulBhalley commented 3 years ago

This is irritating. Why can't Apple just do it for us!

3DTOPO commented 3 years ago

They provide the tools, but it's up to use to use them properly. There is documentation and places to find help.

3DTOPO commented 3 years ago

I'm trying to wrap up development of an update that I've spent 2 years working on. Is this glaring bug ever going to be addressed?

Otherwise I am facing shipping a product with a horrendous work around for a feature that is supposed to be supported. I can't express how frustrating this issue is and one of the most critical toolchains for my app development.

3DTOPO commented 3 years ago

Thanks for the tip but it used to be possible and according to the docs should be possible - examples are shown how to do it.

I've tried that method (and so have others as reported in this forum) and doesn't work for me - that was the first thing I tried.

3DTOPO commented 3 years ago

Are you using the model.py I have defined in the original post?

jakesabathia2 commented 3 years ago

@3DTOPO Yeah, I also got the error right now. Turn out there should be a bug for flexible shape for the image input type ... A quick workaround for this is to use a TensorInput type instead:

mlmodel = ct.convert(
    traced_model,
    inputs=[ct.TensorType(name="input_1", shape=(1, channels, ct.RangeDim(256, 3072), ct.RangeDim(256, 3072)))],
    # outputs must not be specified for PyTorch,
)
import numpy as np
np_input = np.random.rand(1, 3, 2500, 2500)
output = mlmodel.predict({"input_1": np_input})
print(output)

This code snippet works fine on my local.

jakesabathia2 commented 3 years ago

But for the image input type with flexible, we need to investigate the issue.

3DTOPO commented 3 years ago

Yeah that is the whole point - flexible image inputs are not possible. My work around is to use a flexible array input and convert the image to array using the Accelerate framework. It works but is a ridiculous work around compared to using an image input which is supposed to be supported.

aseemw commented 3 years ago

There was a bug in the Core ML framework, in mac OS Big Sur, when using image inputs with rangeDim shapes. This has been fixed with mac OS Monterey. Please see #1263 for unit tests.

3DTOPO commented 3 years ago

But it's not working for me on Monterey. In fact, it is now much worse for me. I used to be able to use a flexible array input with flexible output image, but now, that even doesn't work: https://github.com/apple/coremltools/issues/1244

3DTOPO commented 3 years ago

Just so it's more clear, what specifically needs to happen on Monterey?

Does the model have to be compiled on Monterey? What about linux?

Or does the app have to be compiled in Xcode on Monterey?

Can you provide specific versions (Xcode, macOS, coremltools, python, PyTorch, etc) for the complete env where you have verified them to work?

aseemw commented 3 years ago

Are you able to run TestFlexibleInputShapes from this PR : #1263 and see if they pass for you? That is, run: pytest -v coremltools/converters/mil/test_flexible_shape_inputs.py:: TestFlexibleInputShapes?

python: 3.7 or 3.8 coremltools: 5.0b2 macOS: monterery Xcode: Xcode 13 pytorch: 1.9

muhammetguler commented 1 year ago

Hello, I encountered the same problem while configuring a different model. Did you handle it?

apple / coremltools