Closed qiyan98 closed 3 years ago
BTW, according to this post https://github.com/BoChenYS/BPnP/issues/5. create_graph=False
should not be use. However, this option significantly reduces GPU memory usage. Do you have any suggestions on reducing the memory usage as well as the runtime of BPnP_fast
by other means?
Hi Qi, I didn't notice any difference in practical results between the fast version and the original version. So my hypothesis is that the approximate gradients given by the fast version are accurate enough for practical use. For your second question, thank you for your suggestion. I will optimize the code to get rid of the unnecessary memory usage for the fast version. To further reduce memory usage, you may modify the get_coefs function to loop over every 3D point, instead of processing all n 3D points together. But that would significantly increase the runtime so I wouldn't recommend it. Cheers Bo
Hi Bo,
Thanks so much for your message & wish you a happy holiday season! Sorry that I may have to bother you with another question about training stability. In our task, we are interested in grad_z
but training with BPnP
is highly unstable because of frequent gradient explosions (see below).
For a fair comparison, I tried the central difference-based PnP gradient estimator implemented in DSAC*
and kept everything else the same. It was much more stable (at least the training was able to proceed).
I played with the forward PnP solver configurations a bit and it didn't help to solve the problem. I wonder if you could provide any insights/suggestions on this empirical result? I guess a naive clamp
over grad_z
would be helpful but I am not sure how it would hurt the optimality.
Many thanks, Qi Yan
Hi Bo,
Thanks so much for your message & wish you a happy holiday season! Sorry that I may have to bother you with another question about training stability. In our task, we are interested in
grad_z
but training withBPnP
is highly unstable because of frequent gradient explosions (see below).For a fair comparison, I tried the central difference-based PnP gradient estimator implemented in
DSAC*
and kept everything else the same. It was much more stable (at least the training was able to proceed).I played with the forward PnP solver configurations a bit and it didn't help to solve the problem. I wonder if you could provide any insights/suggestions on this empirical result? I guess a naive
clamp
overgrad_z
would be helpful but I am not sure how it would hurt the optimality.Many thanks, Qi Yan
After checking the logging, I confirm that the training instability comes from nan
grad_z
values, which is due to division by zero in batch_project
funtion pts2d_pro = pts2d_proj[:, :, 0:2].div(S)
. Just made a pull request trying to fix this numerical issue.
BTW, this error is noted by torch.autograd.set_detect_anomaly(True)
. It seems that this also significantly alleviates the gradient explosion problem.
Hi Qi, Thanks for your analysis and the pull request. However, I don't think clamping S to min=0.1 is a good idea generically. S is the z coordinates of the pts3d_cam, i.e., the 3D points expressed in the camera coordinate system. S should be able to take any real values including negative (object behind camera scenario). While S=0 is is extremely unlikely in practice, for applications that somehow have a lot of S=0, ad hoc treatments can be applied such as S[S==0] = S[S==0]+1e-8 or something similar.
Cheers Bo
Hi Qi, Thanks for your analysis and the pull request. However, I don't think clamping S to min=0.1 is a good idea generically. S is the z coordinates of the pts3d_cam, i.e., the 3D points expressed in the camera coordinate system. S should be able to take any real values including negative (object behind camera scenario). While S=0 is is extremely unlikely in practice, for applications that somehow have a lot of S=0, ad hoc treatments can be applied such as S[S==0] = S[S==0]+1e-8 or something similar.
Cheers Bo
Yes, you're right and this is absolutely specific to applications. Thanks for your advice.
Hi Bo,
I was wondering if you could elaborate more on the
BPnP_fast
method implemented in your code, which is not explained in your paper. In its comments, it says that higher order derivatives are dismissed for efficiency purpose and this sacrifice is 'negligible' for gradient accuracy.https://github.com/BoChenYS/BPnP/blob/114f731dbf2b287f818b8afdce0ac479068fa4f1/BPnP.py#L211-L225
Could you provide some insights for the application of this function? For example, what kind of practical consequences would you expect out of using
BpnP_fast
? In our task, I find regularBPnP
is really slow and thus hardly applicable to joint training with other modules, whileBPnP_fast
is much better.Best, Qi Yan