Two questions about sec 3.2 and sec 3.3 in your paper

BigDeviltjj commented 5 years ago

Hi, I've read the paper and I am confused with section 3.2 and section 3.3. Could you please answer the following two questions?

In section 3.2, you proposed Multi-resolution processing to substitute multi-scale version originally proposed in PointNet++. However, Multi-resolution approach was proposed in PointNet++ which was also more efficient than MSG. Have you compared memory efficiency with it? Moreover, why down sampling approach you proposed is better than multi-resolution approach in PointNet++?

In section 3.3, algorithm 2, line 2, function BackwardMaxPooling gets gradient of max pooling wrt output gradient. But in conventional implementation, backward of maxpooling needs to know input data, output data and output gradient so that output data can be compare with input data and input gradient can be updated according to them. But input data of in MaxPooling is freed in forward pass stage. Thus, how can we get back propagation of Max-pooling operator without knowing its input whose memory is freed in forward pass stage?

erictuanle commented 5 years ago

Hello, thank you for your interest in our paper. Here are some indications regarding your questions:

1] Thank you for your suggestion, we will add the comparison in the next revision. From our understanding, PointNet++ MRG is not as good as PointNet++ MSG based on ScanNet performance results. However, we expect PointNet++ MRG to be leaner than the MSG version (but not as drastic as ours). Our approach outperforms PointNet++ MRG because it wins on all counts, performance, speed and memory efficiency. This can be explained because our different resolutions are generated recursively by decreasing progressively the resolution and thus building a hierarchical architecture. In PointNet++, the multi-resolution is only generated by the current resolution plus the raw point cloud as you can see below: “In Fig. 3 (b), features of a region at some level L{i} is a concatenation of two vectors. One vector (left in figure) is obtained by summarizing the features at each subregion from the lower level L{i−1} using the set abstraction level. The other vector (right) is the feature that is obtained by directly processing all raw points in the local region using a single PointNet.”

2] In our implementation, we are trying hard to avoid storing the neighbourhood feature tensors to limit the use of GPU memory. Thus, the input tensor cannot be saved on the GPU. In fact, to compute the backward pass through the max-pooling layer, we only need to know which indices remain after max-pooling. You will find a short code snippet in Pytorch below for you to get more details on how we do this. Forward MaxPooling: output_tensor, indices_max = torch.max(input_tensor, dim=2) #note that we keep track of indices in indices_max Backward MaxPooling: tensor = zerotensor.view(1,1,1,1).repeat(batch_size, nb_subsampled_points, num_neighbors, output_dim_features) gradinput = tensor.scatter(2, indices_max.unsqueeze(2), grad_output.unsqueeze(2))

We will release the code soon.

BigDeviltjj commented 5 years ago

I really appreciate your attention to my problems!

About question 1, I've got your idea, MRG you proposed is more like dilation convolution in common CNN. However, I am still wondering why multi-resolution can bring better performance. I am clear that it is hard to explain the reason behind most learning approaches. However, we made an experiment, we changed the pointnet++ backbone in a detection model to pointnet, finding that pointnet's performance is better. Thus I start to think of what factor is really important for feature learning.

About question 2, I am clear now, thx for your replying!

erictuanle / GoingDeeperwPointNetworks

Two questions about sec 3.2 and sec 3.3 in your paper #1