Closed cmpute closed 3 years ago
The engine does not support padding as it does not make much sense for convolution on a sparse tensor.
We use a generalized sparse tensor that spans the space infinitely. For example, a sparse tensor spans an N-dimensional space with no boundary of the space. Adding paddings in the infinitely spanned space does not make sense.
What is a good use case of padding on sparse tensors?
For example, for a traditional convolution with input tensor dimension of 5, corresponding kernel size 3, stride 2 with padding 1, the output dimension is (5 + 2*1 - 3)/2 + 1 = 3. If no padding, then output dimension is (5 - 3)/2 + 1 = 2. However I'm not quite clear about the case in sparse convolution... is the output just floor(5 / 2) = 2?
So basically in sparse convolution, it assumes that all locations without element is zero? The how is the input and output coordinate mapped? Sorry I didn't know much about the implementation details, I just wish I can have a drop-in replacement for existing 3D convolution code
Yes, all the other elements are 0. A strided convolution or pooling with stride s floors a coordinate to the nearest multiples of s, a coordinate (x, y) to (s floor(x / s), s floor(y / s)).
Okay I understand. Then let me give small example, if I have vector with dimension of 5, ie T=[T0, T1, T2, T3, T4]
. So if I use a classical convolution with kernel size 3 and stride 2, no padding, I can get convolution operated over [T0, T1, T2]
and [T2, T3, T4]
(output dimension is 2)
But for sparse convolution, since it assume no boundary, it actually behave like a convolution with padding 1, it will operate on [0, T0, T1]
, [T1, T2, T3]
, [T3, T4, 0]
(output dimension is 3).
Is there a way to achieve similar behavior as the upper example? I thought about using stride=3, but in that case the convolution will operate on [0, T0, T1]
, [T2, T3, T4]
, which is not the same.
So the feature I requested is like an offset on coordinate calculation, to put the convolution map offset a little bit
Also I want to mention another problem I met here, when I try to densify the tensor with specific dimensions, the function requires "The minimum coordinates must be divisible by the tensor stride." Why is this required when doing conversion? And it's also quite weird that you need to specify the dimension of the tensor before convolution (accounting the tensor_strides) rather than directly provide the final dimensions
You are using it wrong. The dimension is the spatial dimension. As I mentioned, the size of a sparse tensor is infinite and there's no boundary. In your example, the dimension is 1. If you are using 3D data, it is 3. Please read the documentation first.
There is a way to specify the convolution kernel to be off from the current center by modifying the kernel generator https://stanfordvl.github.io/MinkowskiEngine/convolution.html#id1. But I dont think this would make any difference in the final performance. I would be very surprised if it does make a difference.
Sorry maybe I used the wrong term, I'm using a 3D tensor, here I take a 1D vector as example, the length ("dimension" I mentioned above) of the vector is 5. However the size of output tensor shape is calculated in each dimension, so I take 1D vector as example.
Also could you kindly explain about my second question above your reply?
Can you post a self contained code that gives you that error here?
Maybe I can directly post my structure here:
def MEConvGen(cin, cout, k, s=1):
return [
me.MinkowskiConvolution(cin, cout, kernel_size=k, stride=s, dimension=3),
me.MinkowskiBatchNorm(cout),
me.MinkowskiReLU()
]
self.conv_1 = [
MEConvGen(cin, 16, 3),
MEConvGen(16, 16, 3),
MEConvGen(16, 32, 3, 2)
]
self.conv_1 = nn.Sequential(*sum(self.conv_1, []))
self.conv_2 = [
MEConvGen(32, 32, 3),
MEConvGen(32, 32, 3),
MEConvGen(32, 64, 3, 2)
]
self.conv_2 = nn.Sequential(*sum(self.conv_2, []))
self.conv_3 = [
MEConvGen(64, 64, 3),
MEConvGen(64, 64, 3),
MEConvGen(64, 64, 3),
MEConvGen(64, 64, 3, 2)
]
self.conv_3 = nn.Sequential(*sum(self.conv_3, []))
self.conv_4 = [
MEConvGen(64, 64, 3),
MEConvGen(64, 64, 3),
MEConvGen(64, 64, 3),
MEConvGen(64, 64, (3, 1, 1), (2, 1, 1))
]
self.conv_4 = nn.Sequential(*sum(self.conv_4, []))
The input tensor shape (before sparsification) is (40, 1600, 1408) Theoretical output shape is (2, 200, 176), or (3, 200, 176) if padding added. Then I want to convert tensor back to dense grid:
x, _, _ = x.dense(
min_coords=torch.tensor([0,0,0], dtype=torch.int32),
# max_coords=...
)
Here I guess I need to feed (316, 2008, 176*8) as the max_coods? The parameter input here is quite confusing. It would be great if I can just input (3, 200, 176)
The new version of engine takes the shape of the output dense tensor as an argument.
In current implementation of sparse convolution, there's no control of padding size. How does the engine deal with padding right now? Is this intended or missing feature?