MPI-IS / bilateralNN

Learning Sparse High Dimensional Filters with Neural Networks
http://bilateralnn.is.tue.mpg.de
BSD 3-Clause "New" or "Revised" License
69 stars 25 forks source link

Dilation parameter for Permutohedral Layers #7

Closed codepujan closed 7 years ago

codepujan commented 7 years ago

Like a normal CNN has parameters for dilation, is there any possibility that the convolution applied over the Bilateral Convolution Neural Networks would have the parameter for dilation, padding. If not, any pointers/guidance on where to add those parameters would be great. Thank you

ghost commented 7 years ago

Shouldn't there be parameters like the size of the kernel and padding you would expect on a normal convolution layer?

varunjampani commented 7 years ago

We can control the 'dilation' factor in Bilateral Convolution Layer (BCL) with the lattice feature scales. The lattice feature scales defines the discretization of the lattice. You can think of this as 'voxel size' if the lattice is 3-dimensional. In terms of implementing this, you need to pass the scaled features as 'in_features' and 'out_features' to the BCL ('Permutohedral') layer.

As an example, lets say we want to do the bilateral convolution on a 3D lattice space defined by RGB pixel features. In other words, 'in_features' and 'out_features' are the RGB values of image pixels. Lets say, the RGB values are in the range [0,255]. If we scale these features with 0.1, then the range of features change to [0, 25.5]. If you pass this 0.1 scaled features to BCL layer, the constructed lattice will have 'approximately' 25 bins (simplices) in each of the RGB directions. With the lattice feature scaling, you can control how long-range the connectivity is across the points.

You can easily change the size of the filter kernel with 'neighborhood' parameter. A neighborhood of 2 means that the filter will have 2 neighbors along each direction. In a standard 2D spatial filter, 3x3 filter has 1 neighborhood and 5x5 filter has 2 neighborhood.

There is no 'padding' in BCL layer. The reason for this is BCL doesn't do dense convolutions. It uses hash table to store the populated lattice locations and does convolutions only where the data is populated in high-dimensional spaces. Also, it is not clear why padding would be useful in BCL layer. Do you have any use case for this?

In BCL, you can also control grouping of input channels with 'group' parameter and whether to add any Gaussian 'offset' to the filter kernel with the 'offset' parameter. Refer to http://bilateralnn.is.tue.mpg.de for a brief description of different parameters in a permutohedral layer.

Let me know if something is not clear.

codepujan commented 7 years ago

Yeah, So You mean We should do the 'Atrous Scaling ' sort of augmentation on the PixelFeature layer itself before it reaches the Permutohedral layer? (Unlike Caffe's normal Convolution Scaling / Dilation parameter ) . ?

varunjampani commented 7 years ago

Yes, you need to do the feature scaling before passing onto the Permutohedral layer. If you use standard XY, RGB or XYRGB features, you can use 'PixelFeature' layer for this.