XuyangBai / D3Feat

[TensorFlow] Official implementation of CVPR'20 oral paper - D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features https://arxiv.org/abs/2003.03164
MIT License
261 stars 38 forks source link

Increasing the receptive field on ETH dataset #17

Closed zgojcic closed 4 years ago

zgojcic commented 4 years ago

Hi Xuyang,

in one of the issues you have written that the receptiev field can be increased without increasing the voxel size. Could you elaborate shortly on that, do you just increase the conv_radius or how do you exactly test the generalizability to the ETH dataset.

Best Zan

And for our method, we are also able to increase the receptive field of each point without changing the voxel size (by scaling up the grid size of each layer). nally posted by @XuyangBai in https://github.com/XuyangBai/D3Feat/issues/1#issuecomment-600957694_

XuyangBai commented 4 years ago

Hi Zan,

Thanks a lot for your interest. To test the generalization ability on the ETH dataset, I change two hyper-parameters as follows:

  1. Increase the first_subsampling_dl, which will lead to a larger conv_radius
  2. Change the scale of the kernel points accordingly. The related code is here.

In the original KPConv (the convolution operation my model based on), the first_subsampling_dl is the same with voxel_size, but I disentangle these two parameters so that our descriptor network can be generalized to other scenes with a different scale, without affecting the voxel size. You may find some discussion of experiments on ETH in https://github.com/XuyangBai/D3Feat/issues/14.

Best, Xuyang

zgojcic commented 4 years ago

Hi Xuyang,

thanks for the swift response. So if I understand correctly in 3DMatch you use both the voxel_grid (downsampling) = 0.03 and first_subsampling_dl=0.03.

In ETH dataset you then increase the first_subsampling_dl to 0.0625, which increases the conv radius by a factor of approximately 2, but you leave the voxel_grid = 0.03.

How do you then handle the additional subsampling that happens in the strided convolutions? There the first_subsampling_dl is used to subsample the point cloud so after the first strided convolution you have actually downsampled with 0.0625*2 = 0.125 m or?

Could you also maybe elaborate why you decide to disentangle the voxel_size/first_subsampling_dl and not just increase the conv_radius parameter by a factor of 2?

Best Zan

XuyangBai commented 4 years ago

Hi Zan,

Yes, for 3DMatch I use voxel_grid = 0.03 and first_subsampling_dl = 0.03 and for ETH I use voxel grid = 0.0625 and first_subsampling_dl = 0.11 (or 0.10, sorry I didn't have the exact number right now). I would try a larger first_subsampling_dl to further enlarge the receptive field but I didn't have access to a larger GPU.

I didn't change the subsampling strategy in the original KPConv design, each strided convolution layer will subsample the points by 2 * first_subsampling_dl. So after the first strided convolution I actually have downsampled with 2*0.11m. This results in a slight difference from the KPConv original design where each layer has a doubled voxel size than the previous layer, while we have 0.0625 and 0.11*2 voxel size for the first and second layers.

As for the conv_radius, increasing the conv_radius parameter by a factor 2 has the same effect as increasing first_subsampling_dl by a factor of 2 since the conv_radius is calculated as conv_r = config.first_subsampling_dl * config.KP_extent * 2.5. It might be a more concise implementation which I didn't realize when I try to test the generalization ability of the model. Thanks for pointing it out!

Best, Xuyang.

zgojcic commented 4 years ago

Hi Xuyang,

thanks again for your answers. Now almost all makes sense to me I just have one last question. You mention in the paper that you do not need to change the voxel size in order to increase the receptive field, but you do actually change it or? So you directly use the model trained on 3DMatch with voxel_grid = f_s_dl = 0.03, on the ETH voxel_grid = 0.0625 and f_s_l = 0.11, and the only thing you change is that you scale the kernel points with (0.11/0.03) or?

Best Zan

XuyangBai commented 4 years ago

Hi Zan,

Exactly, to evaluate on ETH, I change the voxel_grid to 0.0625 and first_subsampling_dl to 0.11 and scale the kernel points, and the model weights are reused without finetuning. I didn't say that I use the same voxel size for 3DMatch and ETH, a more proper statement is that the receptive field can be enlarged without affecting the voxel size, but we may need to change the voxel size because of memory issues, or for a fair comparison.

Best, Xuyang

zgojcic commented 4 years ago

Hi,

thanks the quick answers and sharing your great work.

Good luck with the future submissions :)

Best Zan

XuyangBai commented 4 years ago

@zgojcic Thanks a lot. Actually you did help me a lot when I first get familiar with the point cloud registration task and I learned a lot from your PerfectMatch and Multiview registration work. Looking forward to your new work, too : )

Best, Xuyang