apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.34k stars 626 forks source link

Adaptive_pooling returns wrong result when flexible_shape_utils is applied #977

Open starsky opened 3 years ago

starsky commented 3 years ago

🐞Describe the bug

To Reproduce

import torch
import coremltools as ct
import numpy as np
from PIL import Image
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms

class SomeModel(nn.Module):

    def forward(self, inp):
        return F.adaptive_avg_pool2d(inp, output_size=(2,2))

py_model = SomeModel()

# This part is just to create PyTorch traced model
img_size = 500
example_input = np.random.randint(0, 255, (img_size, img_size, 3))
example_input_img = Image.fromarray(example_input, 'RGB')
example_input_tensor = transforms.ToTensor()(example_input_img).view(1, 3, img_size, img_size)
traced_model = torch.jit.trace(py_model, example_input_tensor)

# Here conversion starts
from coremltools import EnumeratedShapes
from coremltools.models.neural_network import flexible_shape_utils

model = ct.convert(
    traced_model,
    inputs=[ct.ImageType(name="input_1", channel_first=True, shape=example_input_tensor.shape)])

model.save('aa_fixed.mlmodel')
# This converts correctly and now I can add flexible input sizes like this
spec = ct.utils.load_spec('./aa_fixed.mlmodel')

#Here I am adding possible input sizes
image_sizes = [flexible_shape_utils.NeuralNetworkImageSize(200, 200)]
image_sizes.append(flexible_shape_utils.NeuralNetworkImageSize(500, 500))
image_sizes.append(flexible_shape_utils.NeuralNetworkImageSize(1000, 1000))
flexible_shape_utils.add_enumerated_image_sizes(spec, feature_name='input_1', sizes=image_sizes)
ct.utils.save_spec(spec, 'aa_flex.mlmodel')
flex_model = ct.models.MLModel('./aa_flex.mlmodel')

#This works fine. The model is converted.

Trace

Now when I run prediction on the model.

example_input = np.random.randint(0, 255, (500, 500, 3))
example_input = example_input.astype(np.float32)
example_input_img = Image.fromarray(example_input, 'RGB')
output = flex_model.predict({flex_model._spec.description.input[0].name: example_input_img})
print(output[list(output.keys())[0]].shape)

Output: (1, 3, 2, 2)

This works fine because as you can see I requested output output_size=(2,2) in PyTorch model.

But when I use input size of (200,200):

example_input = np.random.randint(0, 255, (200, 200, 3))
example_input = example_input.astype(np.float32)
example_input_img = Image.fromarray(example_input, 'RGB')
output = flex_model.predict({flex_model._spec.description.input[0].name: example_input_img})
print(output[list(output.keys())[0]].shape)

Output: (1, 3, 1, 1)

The output is wrong, I get 1x1 tensor instead of 2x2.

Same for the bigger size (1000, 1000)

example_input = np.random.randint(0, 255, (1000, 1000, 3))
example_input = example_input.astype(np.float32)
example_input_img = Image.fromarray(example_input, 'RGB')
output = flex_model.predict({flex_model._spec.description.input[0].name: example_input_img})
print(output[list(output.keys())[0]].shape)

Output: (1, 3, 4, 4)

The output is wrong, I get 4x4 tensor instead of 2x2.

The problem is that during the conversion, the adaptive pooling layer is converted to regular pooling, thus kernel size is not computed dynamically based on the current input. I tried dynamic conversion with EnumeratedShapes but it also fails (see https://github.com/apple/coremltools/issues/976)

System environment (please complete the following information):

TobyRoseman commented 1 year ago

I think this is an issue with your PyTorch model. Each of the following lines causes a cast error:

traced_model(np.random.randint(0, 255, (500, 500, 3)))
traced_model(np.random.randint(0, 255, (200, 200, 3)))
traced_model(np.random.randint(0, 255, (1000, 1000, 3)))