BoChenYS / BPnP

Back-propagatable PnP
MIT License
308 stars 34 forks source link

Could you please elaborate on BPnP_fast? #8

Closed qiyan98 closed 3 years ago

qiyan98 commented 3 years ago

Hi Bo,

I was wondering if you could elaborate more on the BPnP_fast method implemented in your code, which is not explained in your paper. In its comments, it says that higher order derivatives are dismissed for efficiency purpose and this sacrifice is 'negligible' for gradient accuracy.

https://github.com/BoChenYS/BPnP/blob/114f731dbf2b287f818b8afdce0ac479068fa4f1/BPnP.py#L211-L225

Could you provide some insights for the application of this function? For example, what kind of practical consequences would you expect out of using BpnP_fast? In our task, I find regular BPnP is really slow and thus hardly applicable to joint training with other modules, while BPnP_fast is much better.

Best, Qi Yan

qiyan98 commented 3 years ago

BTW, according to this post https://github.com/BoChenYS/BPnP/issues/5. create_graph=False should not be use. However, this option significantly reduces GPU memory usage. Do you have any suggestions on reducing the memory usage as well as the runtime of BPnP_fast by other means?

BoChenYS commented 3 years ago

Hi Qi, I didn't notice any difference in practical results between the fast version and the original version. So my hypothesis is that the approximate gradients given by the fast version are accurate enough for practical use. For your second question, thank you for your suggestion. I will optimize the code to get rid of the unnecessary memory usage for the fast version. To further reduce memory usage, you may modify the get_coefs function to loop over every 3D point, instead of processing all n 3D points together. But that would significantly increase the runtime so I wouldn't recommend it. Cheers Bo

qiyan98 commented 3 years ago

Hi Bo,

Thanks so much for your message & wish you a happy holiday season! Sorry that I may have to bother you with another question about training stability. In our task, we are interested in grad_z but training with BPnP is highly unstable because of frequent gradient explosions (see below). grad_info_bpnp_joint

For a fair comparison, I tried the central difference-based PnP gradient estimator implemented in DSAC* and kept everything else the same. It was much more stable (at least the training was able to proceed). grad_info_dsacstar_joint

I played with the forward PnP solver configurations a bit and it didn't help to solve the problem. I wonder if you could provide any insights/suggestions on this empirical result? I guess a naive clamp over grad_z would be helpful but I am not sure how it would hurt the optimality.

Many thanks, Qi Yan

qiyan98 commented 3 years ago

Hi Bo,

Thanks so much for your message & wish you a happy holiday season! Sorry that I may have to bother you with another question about training stability. In our task, we are interested in grad_z but training with BPnP is highly unstable because of frequent gradient explosions (see below). grad_info_bpnp_joint

For a fair comparison, I tried the central difference-based PnP gradient estimator implemented in DSAC* and kept everything else the same. It was much more stable (at least the training was able to proceed). grad_info_dsacstar_joint

I played with the forward PnP solver configurations a bit and it didn't help to solve the problem. I wonder if you could provide any insights/suggestions on this empirical result? I guess a naive clamp over grad_z would be helpful but I am not sure how it would hurt the optimality.

Many thanks, Qi Yan

After checking the logging, I confirm that the training instability comes from nan grad_z values, which is due to division by zero in batch_project funtion pts2d_pro = pts2d_proj[:, :, 0:2].div(S). Just made a pull request trying to fix this numerical issue. BTW, this error is noted by torch.autograd.set_detect_anomaly(True). It seems that this also significantly alleviates the gradient explosion problem.

grad_info

BoChenYS commented 3 years ago

Hi Qi, Thanks for your analysis and the pull request. However, I don't think clamping S to min=0.1 is a good idea generically. S is the z coordinates of the pts3d_cam, i.e., the 3D points expressed in the camera coordinate system. S should be able to take any real values including negative (object behind camera scenario). While S=0 is is extremely unlikely in practice, for applications that somehow have a lot of S=0, ad hoc treatments can be applied such as S[S==0] = S[S==0]+1e-8 or something similar.

Cheers Bo

qiyan98 commented 3 years ago

Hi Qi, Thanks for your analysis and the pull request. However, I don't think clamping S to min=0.1 is a good idea generically. S is the z coordinates of the pts3d_cam, i.e., the 3D points expressed in the camera coordinate system. S should be able to take any real values including negative (object behind camera scenario). While S=0 is is extremely unlikely in practice, for applications that somehow have a lot of S=0, ad hoc treatments can be applied such as S[S==0] = S[S==0]+1e-8 or something similar.

Cheers Bo

Yes, you're right and this is absolutely specific to applications. Thanks for your advice.