Neighborhood in Upsampling

Hi,

Thanks for sharing. I implemented your described SegNet structure for segmentation in Keras using your custom tf operators for convolutions and max-pooling and am getting good results on a custom dataset. I am using it on smaller local point clouds of up to 32k points with only 4 residual stages. If I increase the amount of stages I am getting much worse performance, which is kind of counter-intuitive for residual blocks. In fact I am getting the best results if I use only one residual block per stage instead of two. The more I add the worse the accuracy becomes. I suspect that I did not understand your paper in regards to the upsampling of point clouds correctly.

In your paper you write

Upsampling (flex-upsampling) is done by copying the features of the selected points into the larger-sized layer, initializing all other points with zero, like zero-padding in images and performing the flex-max-pooling operation.

Can you elaborate on that a little? With selected points I assume you mean the downsampled points in relation to the previous stage. I gather those into a zero-initialized Tensor which has the shape of the larger-sized layer. After that I apply your max-pooling operator on that tensor. Which neighborhood do I use for that operation? Currently I am using the all-to-all neighborhood of the larger-sized Tensor (the same that I use for downsampling after pooling), but I think this might lead to zero features, if the entire neighborhood of points in dense areas got removed in the downsampling process. Instead I could just use the nearest neighbor from the downsampled points. But this would kind of make the max-pooling operation on that neighborhood useless as it would always be the value of that nearest neighbor. What were your thoughts on the upsampling?

cgtuebingen / Flex-Convolution

Neighborhood in Upsampling #14