Open cvv-student opened 4 months ago
Besides, I am very curious about the source code of how metal is implemented.
@cvv-student what are the shapes of your input, offset, weight and bias tensors for deform_conv2d
?
or did you only change the channels for the input tensors in the demo script?
@dneprDroid
Based on your script, I only modified one line of code in 'convert.py' from
self.weight = torch.rand(1, 3, kwh, kwh)
to
self.weight = torch.rand(3, 3, kwh, kwh)
which triggered ' File "DeformConv2d-Metal/converter/ops.py", line 41, in type_inference assert self.p1.shape[-1] == self.p2.shape[-2]
'.
In addition, my 'offset' comes from the 'input', similar to
offset = self.offset_conv(input)
result = deform_conv2d(input, offset, self.weight, mask=self.mask)
But it also triggered 'File "DeformConv2d-Metal/converter/mil.py", line 46, in torchvision_deform_conv2d assert offset.op.op_type == 'const', 'the offset param should be stored in the weights'
AssertionError: the offset param should be stored in the weights'.
I would like to know why it is necessary to set 'offset' to static when using metal, rather than allowing it to change with the 'input'.
self.weight = torch.rand(3, 3, kwh, kwh)
The current version doesn't support multi-batch mode (in shapes like [b, c, w, h]
, where b
> 1), but I'll fix this later.
AssertionError: the offset param should be stored in the weights'.
it converts the mask and offset tensors from the weights into 1-channel MTLTexture-s, because it's easier to work with 1-channel textures than with 4-channel (like in the texture for the input tensor) in the shader to represent tensors with flexible shapes. So we can configure the number of channels only for the weights. For input tensors in this method, CoreML almost always uses 4-channel textures, so currently only input tensor is represented as a 4-channel MTLTexture. I planned to support 4-channel textures for the mask and offset tensors later when I have time to implement this.
Thank you for your explanation. I still have some doubts. I only modified the output channels of the weight, which doesn't seem to be related to the b
in [b, c, h, w]
. Additionally, having the Metal source code would be even better, as I'm already thinking about how to implement the operator.
Sorry, I’ve implemented this operator for my personal project and didn’t plan to publish the sources of those Metal shaders (dneprDroid::deform_conv2d
and dneprDroid::addmm
) for free. If you’re interested, we can discuss that - you can find my email in the profile description.
I tried setting the output channels to 3 in convert.py, but it triggered an assertion assert self.p1.shape[-1] == self.p2.shape[-2] in the addmm_op. I look forward to your help. Thank you.