NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.43k stars 358 forks source link

Is the any way to specify padding for convolution? #110

Closed cmpute closed 3 years ago

cmpute commented 4 years ago

In current implementation of sparse convolution, there's no control of padding size. How does the engine deal with padding right now? Is this intended or missing feature?

chrischoy commented 4 years ago

The engine does not support padding as it does not make much sense for convolution on a sparse tensor.

We use a generalized sparse tensor that spans the space infinitely. For example, a sparse tensor spans an N-dimensional space with no boundary of the space. Adding paddings in the infinitely spanned space does not make sense.

What is a good use case of padding on sparse tensors?

cmpute commented 4 years ago

For example, for a traditional convolution with input tensor dimension of 5, corresponding kernel size 3, stride 2 with padding 1, the output dimension is (5 + 2*1 - 3)/2 + 1 = 3. If no padding, then output dimension is (5 - 3)/2 + 1 = 2. However I'm not quite clear about the case in sparse convolution... is the output just floor(5 / 2) = 2?

cmpute commented 4 years ago

So basically in sparse convolution, it assumes that all locations without element is zero? The how is the input and output coordinate mapped? Sorry I didn't know much about the implementation details, I just wish I can have a drop-in replacement for existing 3D convolution code

chrischoy commented 4 years ago

Yes, all the other elements are 0. A strided convolution or pooling with stride s floors a coordinate to the nearest multiples of s, a coordinate (x, y) to (s floor(x / s), s floor(y / s)).

cmpute commented 4 years ago

Okay I understand. Then let me give small example, if I have vector with dimension of 5, ie T=[T0, T1, T2, T3, T4]. So if I use a classical convolution with kernel size 3 and stride 2, no padding, I can get convolution operated over [T0, T1, T2] and [T2, T3, T4] (output dimension is 2) But for sparse convolution, since it assume no boundary, it actually behave like a convolution with padding 1, it will operate on [0, T0, T1], [T1, T2, T3], [T3, T4, 0] (output dimension is 3). Is there a way to achieve similar behavior as the upper example? I thought about using stride=3, but in that case the convolution will operate on [0, T0, T1], [T2, T3, T4], which is not the same.

cmpute commented 4 years ago

So the feature I requested is like an offset on coordinate calculation, to put the convolution map offset a little bit

cmpute commented 4 years ago

Also I want to mention another problem I met here, when I try to densify the tensor with specific dimensions, the function requires "The minimum coordinates must be divisible by the tensor stride." Why is this required when doing conversion? And it's also quite weird that you need to specify the dimension of the tensor before convolution (accounting the tensor_strides) rather than directly provide the final dimensions

chrischoy commented 4 years ago

You are using it wrong. The dimension is the spatial dimension. As I mentioned, the size of a sparse tensor is infinite and there's no boundary. In your example, the dimension is 1. If you are using 3D data, it is 3. Please read the documentation first.

There is a way to specify the convolution kernel to be off from the current center by modifying the kernel generator https://stanfordvl.github.io/MinkowskiEngine/convolution.html#id1. But I dont think this would make any difference in the final performance. I would be very surprised if it does make a difference.

cmpute commented 4 years ago

Sorry maybe I used the wrong term, I'm using a 3D tensor, here I take a 1D vector as example, the length ("dimension" I mentioned above) of the vector is 5. However the size of output tensor shape is calculated in each dimension, so I take 1D vector as example.

Also could you kindly explain about my second question above your reply?

chrischoy commented 4 years ago

Can you post a self contained code that gives you that error here?

cmpute commented 4 years ago

Maybe I can directly post my structure here:

def MEConvGen(cin, cout, k, s=1):
    return [
        me.MinkowskiConvolution(cin, cout, kernel_size=k, stride=s, dimension=3),
        me.MinkowskiBatchNorm(cout),
        me.MinkowskiReLU()
    ]

self.conv_1 = [
    MEConvGen(cin, 16, 3),
    MEConvGen(16, 16, 3),
    MEConvGen(16, 32, 3, 2)
]
self.conv_1 = nn.Sequential(*sum(self.conv_1, []))
self.conv_2 = [
    MEConvGen(32, 32, 3),
    MEConvGen(32, 32, 3),
    MEConvGen(32, 64, 3, 2)
]
self.conv_2 = nn.Sequential(*sum(self.conv_2, []))
self.conv_3 = [
    MEConvGen(64, 64, 3),
    MEConvGen(64, 64, 3),
    MEConvGen(64, 64, 3),
    MEConvGen(64, 64, 3, 2)
]
self.conv_3 = nn.Sequential(*sum(self.conv_3, []))
self.conv_4 = [
    MEConvGen(64, 64, 3),
    MEConvGen(64, 64, 3),
    MEConvGen(64, 64, 3),
    MEConvGen(64, 64, (3, 1, 1), (2, 1, 1))
]
self.conv_4 = nn.Sequential(*sum(self.conv_4, []))

The input tensor shape (before sparsification) is (40, 1600, 1408) Theoretical output shape is (2, 200, 176), or (3, 200, 176) if padding added. Then I want to convert tensor back to dense grid:

x, _, _ = x.dense(
    min_coords=torch.tensor([0,0,0], dtype=torch.int32),
    # max_coords=...
)

Here I guess I need to feed (316, 2008, 176*8) as the max_coods? The parameter input here is quite confusing. It would be great if I can just input (3, 200, 176)

chrischoy commented 3 years ago

The new version of engine takes the shape of the output dense tensor as an argument.

https://github.com/NVIDIA/MinkowskiEngine/blob/858b856896fc55f2bca6e8a8c7829268269bdf61/tests/python/dense.py#L120