Closed Guocode closed 4 years ago
It would be an under-constraint problem. The network could converge to a trivial solution, as you can imagine, e.g., all 3D and 2D keypoints collapse to a single point. You will need to exploit other constraints of your application. But it is an interesting direction that I have been looking into.
In addition to the keypoints, maybe we can define other geometric constraint like circle or ellipse with certain range of shape, so that the network can converge to a realistic map. Thanks for your helpful reply and your great work.
Pleasure!
I have a thought that if I only have pose label and camera intristic, as for some tasks like gaze estimation(6DoF of eyeball) it's hard to get cannical model ,can I train a network using bpnp minimizing the pose loss to get the latent rigid 3D key points of the object and 2D key points.