Adaptive_pooling returns wrong result when flexible_shape_utils is applied

🐞Describe the bug

When flexible_shape_utils, the resulting tensor sizes obtained from the adaptive pooling layer are incorrect.
I am converting the model from PyTorch.

To Reproduce

import torch
import coremltools as ct
import numpy as np
from PIL import Image
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms

class SomeModel(nn.Module):

    def forward(self, inp):
        return F.adaptive_avg_pool2d(inp, output_size=(2,2))

py_model = SomeModel()

# This part is just to create PyTorch traced model
img_size = 500
example_input = np.random.randint(0, 255, (img_size, img_size, 3))
example_input_img = Image.fromarray(example_input, 'RGB')
example_input_tensor = transforms.ToTensor()(example_input_img).view(1, 3, img_size, img_size)
traced_model = torch.jit.trace(py_model, example_input_tensor)

# Here conversion starts
from coremltools import EnumeratedShapes
from coremltools.models.neural_network import flexible_shape_utils

model = ct.convert(
    traced_model,
    inputs=[ct.ImageType(name="input_1", channel_first=True, shape=example_input_tensor.shape)])

model.save('aa_fixed.mlmodel')
# This converts correctly and now I can add flexible input sizes like this
spec = ct.utils.load_spec('./aa_fixed.mlmodel')

#Here I am adding possible input sizes
image_sizes = [flexible_shape_utils.NeuralNetworkImageSize(200, 200)]
image_sizes.append(flexible_shape_utils.NeuralNetworkImageSize(500, 500))
image_sizes.append(flexible_shape_utils.NeuralNetworkImageSize(1000, 1000))
flexible_shape_utils.add_enumerated_image_sizes(spec, feature_name='input_1', sizes=image_sizes)
ct.utils.save_spec(spec, 'aa_flex.mlmodel')
flex_model = ct.models.MLModel('./aa_flex.mlmodel')

#This works fine. The model is converted.

Trace

Now when I run prediction on the model.

example_input = np.random.randint(0, 255, (500, 500, 3))
example_input = example_input.astype(np.float32)
example_input_img = Image.fromarray(example_input, 'RGB')
output = flex_model.predict({flex_model._spec.description.input[0].name: example_input_img})
print(output[list(output.keys())[0]].shape)

Output: (1, 3, 2, 2)

This works fine because as you can see I requested output output_size=(2,2) in PyTorch model.

But when I use input size of (200,200):

example_input = np.random.randint(0, 255, (200, 200, 3))
example_input = example_input.astype(np.float32)
example_input_img = Image.fromarray(example_input, 'RGB')
output = flex_model.predict({flex_model._spec.description.input[0].name: example_input_img})
print(output[list(output.keys())[0]].shape)

Output: (1, 3, 1, 1)

The output is wrong, I get 1x1 tensor instead of 2x2.

Same for the bigger size (1000, 1000)

example_input = np.random.randint(0, 255, (1000, 1000, 3))
example_input = example_input.astype(np.float32)
example_input_img = Image.fromarray(example_input, 'RGB')
output = flex_model.predict({flex_model._spec.description.input[0].name: example_input_img})
print(output[list(output.keys())[0]].shape)

Output: (1, 3, 4, 4)

The output is wrong, I get 4x4 tensor instead of 2x2.

The problem is that during the conversion, the adaptive pooling layer is converted to regular pooling, thus kernel size is not computed dynamically based on the current input. I tried dynamic conversion with EnumeratedShapes but it also fails (see https://github.com/apple/coremltools/issues/976)

System environment (please complete the following information):

coremltools version: 4.0b3
OS: MacOS
macOS version: 10.15.6
XCode version: 11.7
Python installation: anaconda
python version: 3.7

apple / coremltools