Questions about training

Hi,

first of all, thanks a lot for sharing your code! I have some questions about the training process:

1) Do you also use 10 iterations of the IC-LK algorithm during training?

2) In line 222 of trainer.py (quoted below) , you seem to define two different variants for the feature loss. If I understand correctly, the variant when pr is None is what is described in the paper, whereas the second variant seems to compare the feature difference between the last and the previous iteration of the LK algorithm. Can you explain the second variant and give an intuition of which to use when?

        pr = ptnetlk.prev_r
        if pr is not None:
            loss_r = model.AnalyticalPointNetLK.rsq(r - pr)
        else:
            loss_r = model.AnalyticalPointNetLK.rsq(r)

3) I'm a bit confused about how feature-aggregation/random features is implemented: The code snippet below seems to implement the splitting strategy for feature computation described in the supplementary material. However, the computed features are overwritten for each new split of the point cloud, so I take it this corresponds to the random feature selection approach? Furthermore, f1 is never used in the following code and f0 is overwritten when the jacobian is computed, so I'm wondering if this computation actually serves any purpose (except for initializing the batch norm layers) or if it is just there for reference?

        # create a data sampler
        if mode != 'test':
            data_sampler = np.random.choice(num_points,  (num_points//num_random_points, num_random_points), replace=False)
        # input through entire pointnet
        if training:
            # first, update BatchNorm modules
            f0 = self.ptnet(p0[:, data_sampler[0], :], 0)
            f1 = self.ptnet(p1[:, data_sampler[0], :], 0)
        self.ptnet.eval()

        if mode != 'test':
            for i in range(1, num_points//num_random_points-1):
                f0 = self.ptnet(p0[:, data_sampler[i], :], i)
                f1 = self.ptnet(p1[:, data_sampler[i], :], i)

4) Random point selection for computing the jacobian: I was wondering if it is important to compute the feature vector using the same subset of the pointcloud that was used to compute the jacobian or if it would also be possible to e.g. compute the jacobian on a random subset but the feature vector on the full point cloud?

5) In general, what would be the recommended setup for training? From the supplementary material, it seems that random features + random jacobian gives the best results (and this also seems to be what is implemented), but my initial tests loading the pretrained model and using this setup give relatively poor results (even if I sample more than 100 points), unless I turn on voxelization (which is not practical during training since the number of voxels with points in them is not constant). Any guidance on this?

Thanks again for sharing your code and sorry about the wall of text. I would be super grateful for your help!

Hey, thanks for reaching out. These are answers to your questions.

Yes, as we also claimed in the paper, we used 10 iterations during training on the ModelNet dataset.
As we claimed in the paper, we have used the "rigid transformation loss" and the "feature loss", which are "loss_pose" and "loss_r" respectively in the code.
We have found that random feature selection is both efficient and can generate a better result. For the code snippet, yes, the first few lines are to update the batch norm layers, then we fix the network weights (except the batch norm parts, because we're using "eval" mode not "test" mode), get "A, M, BN" to analytically compute the feature Jacobian. Note that because we're using the inverse composition, we only need to compute Jacobian once (outside the iteration) on p0, that is why p1 is not used. However, each time we update our estimation (during iteration), we need to compute the feature f for warped p1 (using previously predicted pose).
We have found that random point selection is a good strategy. You can take a deeper look at the supplementary material ([https://openaccess.thecvf.com/content/CVPR2021/supplemental/Li_PointNetLK_Revisited_CVPR_2021_supplemental.pdf]) for better understanding. One thing to note is that the feature is a per-point feature, which should correspond to the feature Jacobian. You can compute a feature vector on the full point cloud, but when computing the Jacobian using a random subset, you should still find the corresponding points. It can be a good strategy to use a feature vector on the full point cloud since we only update network weights according to the feature. And in fact, you could use an out-of-the-shelf pre-trained model as we did for the voxelization part.
Voxelization is not used during training. Voxelization should only be used during testing. It is one of the advantages. If you are using your own data, you might want to pre-train on a small subset or a small synthetic dataset without voxelization. Note that you could try aggregated features and random Jacobian as you said in 4.

Cheers! Let me know if you have further questions.

Lilac-Lee / PointNetLK_Revisited

Questions about training #1