XuyangBai / D3Feat

[TensorFlow] Official implementation of CVPR'20 oral paper - D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features https://arxiv.org/abs/2003.03164
MIT License
259 stars 38 forks source link

Models from different sensors #33

Closed maradanovic closed 2 years ago

maradanovic commented 3 years ago

Hi,

First, thanks for this amazing work and making it open source!

There is something I wanted to ask your advice on.

I would like to use D3Feat for registration of indoor models from different sensors. Specifically, I’m trying to register a dense and detailed TLS (Terrestrial Laser Scanner) point cloud with a mesh created from a depth camera (mesh is converted to point cloud by either taking the vertices or by randomly sampling the mesh, but in both cases these are not as detailed as TLS point clouds). I’ve had some success by using the 3DMatch pretrained network and improved the generalisation by increasing the voxel size, scale of kernel points/receptive field, and the first subsampling to 10 cm, but I’m trying to figure out is there a better angle of approach.

I was thinking about (i) training the network with two different sets of data at once, by combining 3DMatch (depth) and TLS data, and (ii) training two different networks on two different datasets and dealing with the domain adaptation later (with a third network), but neither looks like a good solution to me (ofc might be wrong here).

I know this is a broad question, but how would you approach this problem?

It would be great to hear your opinion!

XuyangBai commented 3 years ago

Replied through email.

maradanovic commented 3 years ago

Thanks!

maradanovic commented 3 years ago

Hi,

I was hoping you could help me again with a short question.

Through testing with the pre-trained 3DMatch and modifying the voxel size, scale of kernel points/receptive field, and the first subsampling, I've found out a voxel = 9 cm performs best when registering low-quality models from different sensors.

However, this is still with the network trained with voxel = 3 cm, and I would like to test the performance of the network trained with the voxel = 9 cm on 3DMatch. Could you please tell me if there is there anything else to adjust other than:

XuyangBai commented 3 years ago

Hi, since you have tested with different combinations of hyperparameters (e.g voxel size, first_subsampling_dl), you could just use the best one for training (so in your case, it seems you only need to set first_subsampling_dl). BTW, I take the voxel_size equals first_subsampling_dl by default for 3DMatch, but you could use different values for them depend on your applications. Generally, voxel_size controls how sparse the input point cloud is (and also removes too-close points) and first_subsampling_dl affect the receptive field of the descriptor.

maradanovic commented 3 years ago

Thanks again!

maradanovic commented 3 years ago

Hi,

I've been having trouble running train_3DMatch with a modified first_subsampling_dl. If left as the original value (0.03), training runs smoothly, but as soon as it's modified this occurs:

`Dataset Preparation


Preparing ply files PKL file not found.

Preparing ply files PKL file not found. Initiating input pipelines Traceback (most recent call last): File "training_3DMatch.py", line 175, in dataset.init_input_pipeline(config) File "/home/dell/pcr/02_d3feat_train/datasets/common.py", line 706, in init_input_pipeline self.batch_limit = self.calibrate_batches(config) File "/home/dell/pcr/02_d3feat_train/datasets/common.py", line 546, in calibrate_batches lim = sizes[-1] * config.batch_num IndexError: index -1 is out of bounds for axis 0 with size 0`

Any idea why this is happening?

XuyangBai commented 3 years ago

You should prepare points.pkl and keypts.pkl for your own dataset https://github.com/XuyangBai/D3Feat/blob/master/datasets/ThreeDMatch.py#L108-L110 The error is raised because your self.anc_points is an empty list.

You can look this file to prepare these two files. points.pkl saves the point cloud and keypts.pkl saves the pre-computed correspondences between pairs using gt poses.

maradanovic commented 3 years ago

Makes sense, thanks! It seems I'll need 350 GB or more to download and unzip the 54 3DMatch datasets, which (of course!) I don't have at the current machine, so I'll need some time to set up.

If you don't mind, let me keep this open for now.

XuyangBai commented 3 years ago

I do not understand why you need 3DMatch dataset. As you mentioned before, you are trying to register a dense and detailed TLS (Terrestrial Laser Scanner) point cloud with a mesh created from a depth camera, so why not prepare your own data and directly train on it, in this case, you will not have the domain gap between your training and test set?

maradanovic commented 3 years ago

You're correct, it would probably be best to train two networks on my own data. However, as far as I know, no public dataset exists with both TLS and depth camera data of the same indoor environments and I only have data of 1 indoor environment. Gathering enough data for training could take up to a month, which is the time I don't have.

I'll try to explain here my reasoning - please correct me if you believe I'm wrong in any point.

The pretrained 3DMatch 3 cm network appears to generalize well with cloud downsampling and modifying the receptive field both to be about 9-11 cm (which speaks highly of the generalization ability of D3Feat!). The reason why this works for my case is probably because I'm registering large models, e.g. a model of a whole room to a model of the whole floor, so I don't really need density or details.

I don't think there's a big difference between point clouds from different sensors with points being 10 cm apart, be it a TLS, depth, or any other type of point cloud of the interior. The main difference comes from the patterns specific sensors leave on specific surfaces, depending on sensor type and sensor platform used (stationary, mobile, etc.). So, with voxel sizes of 10 cm, point clouds of a room should look very similar no matter what sensor was used to gather the data.

With this in mind, I'd like to downsample 3DMatch, which appears to be the best public indoor dataset, and train D3Feat on it, and see if I get better results compared to the pretrained + modifying voxels.

XuyangBai commented 3 years ago

Hi, I see your points. And I agree with your opinion on the differences of point cloud captured by different sensors, choosing a proper down-sampling strategy could reduce such domain gap to some extent. And I have other suggestions for your case.

  1. Some works show that the patch based descriptors (e.g. SpinNet, DIP) have better generalization ability than fully convolutional ones (e.g. D3Feat). I suspect the reason is that fully convolutional descriptors have large receptive field compared with patch-based one, so the learned descriptor might contain some semantic features, which usually unable to transfer for dataset to dataset. But if your application scenario is also the indoor scenes, this problem may not be serious. You can try other descriptors if you want.
  2. Although collecting your own dataset is time-consuming, but you may collect a small one and finetune on it after pre-training on 3DMatch, generally the network is better at overfitting than generalization. For 3DMatch, I think you donot need to download all the origin files, you can just do voxel-downsample with 9cm over 3cm-downsampled point clouds provided by this repo and re-compute the correspondences.
  3. Since you are trying to register a room with the whole floor, there will be a large ratio of outliers even your descriptor is well trained because there are many repetitive patterns in the whole floor. So a carefully tuned RANSAC or other outlier rejection methods (e.g. TEASER, PointDSC) are also critical for your cases.
maradanovic commented 3 years ago

Thanks for taking the time for these suggestions!

  1. My application is currently indoor only. However, I'll have a try at the suggested SpinNet (the code for DIP is unpublished at the moment).
  2. I'm not sure how to do this. Do you mean, stopping the training on 3DMatch at a specific epoch, and continuing on my small dataset afterwards for a while? Good idea - I'll try to figure out how to downsample the 3cm cloud to 9cm (although, for 10 and 11 cm options this wouldn't work as the result would be inconsistent and different from downsampling of the original cloud). As you could probably guess, I'm new to deep learning and it can be challenging to decipher and modify the code (but I slowly get by, especially with your great help).
  3. You're correct - because there is a size difference between point clouds, I've been tracking the inlier ratio (what is called "fitness" by open3d) for both. Not surprisingly, there is a big difference, for a room it can get above 70% and for a floor it can get below 5%. RANSAC works only with a max iteration ramped up to 10+ million (no need to work in real-time so time isn't an issue). I've found out max validation is useless, which is probably why they removed it in recent versions of open3d.