dneprDroid / DeformConv2d-Metal

CoreML custom layer (GPU-accelerated) and converter for torchvision.ops.deform_conv2d
6 stars 1 forks source link

addmm_op error #2

Open cvv-student opened 4 months ago

cvv-student commented 4 months ago

I tried setting the output channels to 3 in convert.py, but it triggered an assertion assert self.p1.shape[-1] == self.p2.shape[-2] in the addmm_op. I look forward to your help. Thank you.

cvv-student commented 4 months ago

Besides, I am very curious about the source code of how metal is implemented.

dneprDroid commented 4 months ago

@cvv-student what are the shapes of your input, offset, weight and bias tensors for deform_conv2d?

dneprDroid commented 4 months ago

or did you only change the channels for the input tensors in the demo script?

cvv-student commented 4 months ago

@dneprDroid Based on your script, I only modified one line of code in 'convert.py' from
self.weight = torch.rand(1, 3, kwh, kwh) to self.weight = torch.rand(3, 3, kwh, kwh) which triggered ' File "DeformConv2d-Metal/converter/ops.py", line 41, in type_inference assert self.p1.shape[-1] == self.p2.shape[-2]'. In addition, my 'offset' comes from the 'input', similar to offset = self.offset_conv(input) result = deform_conv2d(input, offset, self.weight, mask=self.mask) But it also triggered 'File "DeformConv2d-Metal/converter/mil.py", line 46, in torchvision_deform_conv2d assert offset.op.op_type == 'const', 'the offset param should be stored in the weights' AssertionError: the offset param should be stored in the weights'. I would like to know why it is necessary to set 'offset' to static when using metal, rather than allowing it to change with the 'input'.

dneprDroid commented 4 months ago

self.weight = torch.rand(3, 3, kwh, kwh)

The current version doesn't support multi-batch mode (in shapes like [b, c, w, h], where b > 1), but I'll fix this later.

AssertionError: the offset param should be stored in the weights'.

it converts the mask and offset tensors from the weights into 1-channel MTLTexture-s, because it's easier to work with 1-channel textures than with 4-channel (like in the texture for the input tensor) in the shader to represent tensors with flexible shapes. So we can configure the number of channels only for the weights. For input tensors in this method, CoreML almost always uses 4-channel textures, so currently only input tensor is represented as a 4-channel MTLTexture. I planned to support 4-channel textures for the mask and offset tensors later when I have time to implement this.

cvv-student commented 4 months ago

Thank you for your explanation. I still have some doubts. I only modified the output channels of the weight, which doesn't seem to be related to the b in [b, c, h, w]. Additionally, having the Metal source code would be even better, as I'm already thinking about how to implement the operator.

dneprDroid commented 4 months ago

Sorry, I’ve implemented this operator for my personal project and didn’t plan to publish the sources of those Metal shaders (dneprDroid::deform_conv2d and dneprDroid::addmm) for free. If you’re interested, we can discuss that - you can find my email in the profile description.