Lilac-Lee / PointNetLK_Revisited

Implementation for our CVPR 2021 oral paper "PointNetLK Revisited".
MIT License
42 stars 4 forks source link

Voxel_zero_mean do produce "merely solve rotation, translation is solved during datapreprocess" problem #3

Closed Jarrome closed 2 years ago

Jarrome commented 2 years ago

My Q:

In synthetic, no problem, as the zero-mean is with point cloud mean which merely need one point cloud itself. But on 3DMatch test, the zero-mean is with voxel-mean. And this voxel-mean is taken from two aligned point cloud. This is the difference and that's why I say the bug is only on voxelization.

"then the p1 is transformed back with a pose "x" for registration" . For coarse-to-fine registration, P---T1--->P^{1}---T2--->P^{2}. But in the 3DMatch test, it is P---T1--->P^{1}---x--->P---PointNetLK_Revisited--->P^{2}. T1 is somehow global result, P^{1} is one well aligned point cloud. Then with a "x", you align it back to a non-aligned cases P. >> This is in the data preprocessing part.

I also checked, with voxel_zero_mean=True and voxel=1 on the 3DMatch test, it is like the PointNetLK_revisited merely solve rotation during the registration. Please check!

Lilac-Lee's:

zero_mean is required by our method. The voxelization part is the same analog as the synthetic part. voxel_mean is required when computing global Jacobian. Our method solves both rotation and translation, and the estimated translation compensates for the deducted mean.

Jarrome commented 2 years ago

Hi, @Lilac-Lee,

"I also checked, with voxel_zero_mean=True and voxel=1 on the 3DMatch test, it is like the PointNetLK_revisited merely solve rotation during the registration. Please check!" The voxel_mean canceled the real translation which shouldn't be obtained during registration. If use pointcloud_mean as in synthetic data, not problem. But in data_processing, the voxel_mean for p0 and p1 (before aligned with gt transform x) are same!

I think this should be true...

Lilac-Lee commented 2 years ago

Hi, the zero_mean=True will not cancel the real translation, we will compensate for translation later. Imagine that you take the mean first, then the translation to be predicted will surely be smaller, and this is what we want. However, the deducted mean is not the "true translation". So we still need to predict translation and compose the deducted mean to get the final translation.

Jarrome commented 2 years ago

Hi,

I mean not the zero_mean (in synthetic no problem), it is the voxel_zero_mean in the real test. The voxel_zero_mean of two point clouds is exactly the save before aligned with x in datapreprocess (data_utils.py) ThreeDMatch_Testing.getitem:

Lilac-Lee commented 2 years ago

hmm, I think you're referring to "calculate voxel medium". This is all based on the experiment settings we're using. Yes, the "voxel_zero_mean" is the same for both point clouds, because we find the overlapped areas. When voxel=2, there are 8 voxels, we will solve transformation for each voxel, then translate back to the global frame. In the real-world settings, you can try with two raw point clouds (not aligned), then deduct the mean for both of them, then find overlapped areas, then do voxelization. But then you need global registration first. Cheers.

Jarrome commented 2 years ago

Hi, Xueqian,

Yes, and I think it is the " voxel_zero_mean is the same for both point clouds" produces my concern in the 3DMatch data process:

By subtracting the voxel_mean, the voxels in same position from two point cloud merely have rotation difference. Let's see voxel=1, in this case, it seems more clear.

That's why I get little bit confused when got bad result on voxel_zero_mean=False...

Lilac-Lee commented 2 years ago

When you said "voxel_zero_mean=False", did you mean that you set voxel=1, then during testing, you set zero_mean=False in the trainer.py file? If that is the case, then it falls back to our discussions before, that you should always set zero_mean=True. It is required. Also, when you subtractin the voxel mean, you still need to apply a transformation, then the point cloud would have both rotation and translation. Although in the voxel=1 case, the translation should be the voxel mean.

Jarrome commented 2 years ago

Hi, Xueqian,

Emmm... For example, we have P and Q which is well aligned in the very begining of 3DMatch data process. Then we have m which is voxel mean for both P and Q. After that we apply a ground truth alignment x=[R|t].

So now we have RP+t and its voxel mean Rm+t, Q and its voxel mean m.

Then we subtract the voxel mean: (RP+t) - (Rm+t) = R(P-m) Q-m

So in the registration, difference between R(P-m) and Q-m is merely rotation R.

Lilac-Lee commented 2 years ago

Hi Yijun, the example you demonstrated is correct when setting "voxel=1" with "well-aligned" data. Please bear in mind that voxelization here only happens when you set "voxel>1". When solving transformation for voxelization, we first need to solve transformation for the local voxel, then transform back to the global frame. You can view the point cloud in the global frame as a big voxel (rectangle). Then the "voxel difference between the local frame and the global frame" is known. When you force the estimated local transformation back to the global transformation, you need the "voxel difference between the local frame and the global frame". Like I said before, ideally, when adapting to applications, you need to first use global registration to get "well-aligned" data, then process data, and when you deduct the voxel mean for the global frame, you might see a larger estimated translation. Also, for our algorithm, we need to deduct mean to make the estimated translation bounded by a small value. Cheers.

Jarrome commented 2 years ago

Hi, Xueqian,

From my understanding, "voxel>1" would be the same... Please fixme if I get some step wrong... :)

  1. When "voxel>1", well aligned P and Q will be divided into (lets say K) voxels {P_i}, {Q_i} with i \in {1,...,K}. And P_i and Q_i are in the space of a same voxel i.

  2. Then accordingly they have voxel mean that are also same m_i. After that again transform {P_i} with x=[R|t], we have {RP_i + t} and {Rm_i+t}.

  3. So after voxel_zero_mean, we still get {R(P_i-m_i)} and {(Q_i-m_i)} for registration. So we have K voxels centered at origin with only rotation difference for registration. (Label: 1)

Note: *the p1_m=mean{R_m_i+t}=Rmean{m_i}+t and p0_m=mean{m_i} is merely used for Jacobian and postprocess of g**.

Thus as g is the well predicted [R^{-1}| 0] (from Label: 1),

  1. On top of registrated g=[R^{-1}| 0], the postprocess g_final=a0 * g * a1 = [eye(3) | p0_m] [R^{-1} 0] [eye(3) | -p1_m] = [R^{-1} | p0_m] * [eye(3) | -p1_m] = [R^{-1} | R^{-1} (-R*mean{m_i}-t ) + mean{m_i} ] = [R^{-1} | -R^{-1}t] is the final result. ([R^{-1} | -R^{-1}t] [R | t] = Identity, gives a correct g_final). This is how the post process get correct final result g_final with registration result g=[R^{-1}|0].

This is the whole story that I think the bug stands, that g is merely rotation, otherwise g_final will be wrong. Maybe it is my problem of derivation... please check.

And I think if you use point_zero_mean in voxel instead of voxel_zero_mean, it will not introduce this problem while still consistant with your conditional Jacobian theory (with Jacobian conditioned on the point_mean). But performance maynot be so good as this voxel_mean...

Lilac-Lee commented 2 years ago

Hey Yijun, thanks for the explanation. Yeah, I just wanted to remind you that we should always set "voxel>1". Ideally, if we apply the method to a global estimated result, you won't be seeing this situation (i.e., no translation needs to be solved). And in the real application, I will always recommend using voxel mean instead of the point cloud mean for each voxel. Cheers.

Jarrome commented 2 years ago

Hi, Xueqian,

For my previous post. It is about the 3DMatch test in your paper.

Because from my own understanding, the experiment result with voxelization is wired. Would you please check if the 3DMatch voxel>1 only solves rotation during registration? Then it is not fair to compare with non-voxel methods as they solve both rot and trans.

As from the previous derivation it is...

Lilac-Lee commented 2 years ago

Hi Yijun, like I said in my previous post, your explanation is right. In the experiment, even if the transformation only has "translation" in this situation, our method does not know that, and still estimates the rotation and the translation. You can try using the voxelized data to do estimation using non-voxelization methods to test. Cheers.

Jarrome commented 2 years ago

Hi, Xueqian,

I tried only translation in 3DMatchtest with voxel=2. (let x only contains translation).

Before postprocess with a0, a1:
![Screenshot from 2021-12-07 10-28-14](https://user-images.githubusercontent.com/13412144/145003430-97619406-b441-4e45-bf5f-fdd7ec9f4749.png)

after a0 * g * a1:
![Screenshot from 2021-12-07 10-28-29](https://user-images.githubusercontent.com/13412144/145003522-f0fa4af5-0eb0-40fd-a19a-f30865441894.png)

GT is: 
![Screenshot from 2021-12-07 10-39-28](https://user-images.githubusercontent.com/13412144/145004949-1ebcaa3a-0180-4e4d-931e-edc7cab7d7b6.png)

**Then I scale the translation with 10 for x:**
Before postprocess with a0, a1:
![Screenshot from 2021-12-07 10-28-49](https://user-images.githubusercontent.com/13412144/145003577-c11f8d10-a990-453c-8656-58d53a77ea4c.png)

after a0 * g * a1:
![Screenshot from 2021-12-07 10-29-01](https://user-images.githubusercontent.com/13412144/145003600-3c51e900-2d5e-4be1-9901-12791e52fa9d.png)

GT is 
![Screenshot from 2021-12-07 10-38-09](https://user-images.githubusercontent.com/13412144/145004979-5e5a56c8-084b-4d7e-8401-f4fe6ea9c8bc.png)

So in image 1 and 4 gives the exact same result no matter translation\*1 or translation\*10.

Though seems wired, but above 6 commented images cannot say anything.

Im not intented to prove you wrong. Initially just want to compare. Just find theoretical problem in your paper the 3DMatchTest. Since you also agree my previous derivation is correct and doesnot think this bug is a serious problem in research,

I will confirm just not to compare with you voxelization-based registration in my project :)

Lilac-Lee commented 2 years ago

Hi Yijun, I want to point out that our method does not have a theoretical bug, and I never said "doesnot think this bug is a serious problem in research". In the real world application, you should still use voxel mean, not point cloud mean. Thanks for your efforts through these comments! It made me "revisit" thoroughly about our method. And this is my final comment.

Jarrome commented 2 years ago

come on, I'm saying your voxelization based registration "merely solve rotation". So the experiment of 3DMatch in paper is not valid.

I mention "point mean" is just to give you a step. Fine, I think you've already get the point, I will not keep asking...