apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.22k stars 611 forks source link

Support upsample_trilinear3d #1782

Open likesum opened 1 year ago

likesum commented 1 year ago

Example to reproduce:

import torch
import coremltools as ct
import torch.nn.functional as F

class Net(torch.nn.Module):

  def forward(self, x):
    return F.interpolate(x, scale_factor=2.0, mode="trilinear")

class Net2(torch.nn.Module):

  def __init__(self) -> None:
    super().__init__()
    self.upsample3d = torch.nn.Upsample(scale_factor=2.0, mode="trilinear")

  def forward(self, x):
    return self.upsample3d(x)

input_tensor = torch.zeros([1, 8, 16, 16, 16], dtype=torch.float32)

# Check torch.nn.functional.interpolate
torch_model = Net()
traced_model = torch.jit.trace(torch_model, input_tensor)

model_ct = ct.convert(traced_model,
                      inputs=[ct.TensorType(shape=input_tensor.shape)])

# Check torch.nn.Upsample
torch_model = Net2()
traced_model = torch.jit.trace(torch_model, input_tensor)

model_ct = ct.convert(traced_model,
                      inputs=[ct.TensorType(shape=input_tensor.shape)])

Error message:

RuntimeError: PyTorch convert function for op 'upsample_trilinear3d' not implemented.
mlaves commented 11 months ago

With the existence of Conv3D in coremltools, support for 3D upsampling layers would be logical. This is much needed for medical image analysis, video analysis, and other volumetric applications. I tried to implement that myself in coremltools, but I think that CoreML itself does not support 3D upsampling. I got stuck here:

/Users/laves/projects/coremltools/coremltools/models/model.py:154: 
RuntimeWarning: You will not be able to run predict() on this Core ML model. 
Underlying exception message was: Error compiling model: "Failed to parse the 
model specification. Error: Unable to parse ML Program: in operation op_5_cast: 
For operation of type 'upsample_nearest_neighbor' number of inputs must be 
within the range (inclusive): 3 : 3. Provided 4".

One could hack at least upsample_nearest3d for integer scales using mb.conv_transpose with kernel size SxSxS filled with 1 and strides S, S, S, where S is the scale factor:

@register_torch_op
def upsample_nearest3d(context, node):
    inputs = _get_inputs(context, node, expected=3)
    x = inputs[0]
    s = inputs[2]

    c = x.shape[1]
    s_d, s_w, s_h = map(int, s.val)

    x = mb.conv_transpose(
        x=x,
        weight=np.ones((c, 1, s_d, s_w, s_h)),
        strides=[s_d, s_w, s_h],
        groups=c,
        name=node.name
    )

    context.add(x)
TobyRoseman commented 11 months ago

@mlaves - to request changes to the Core ML Framework, please use the Feedback Assistant.