Closed scott89 closed 4 years ago
Hi, thanks for your interest in this repo.
Yes you are right, good catch ! However, this would only be significant with great rotation values. And since the snippet are only 5 frames long, the result we get are still valid IMO. It's also important to remeber that the KITTI itself said that the pose measurement were not great if we took small snippets. But on the principle it's completely true that we should change the output of Posenet and convert it to the camera_00_coordinate.
Not sure what you mean by that. Rotation has no ambiguity, contrary to translation, so even if both translation and rotation come from the same network, they are to be treated differently when it comes to deduce the right depth. The only way I found to adress this problem with posenet is to measure actual translation magnitude, from KITTI OXTS, by using either gps or speed from wheels. More generally, to solve the translational ambiguity, you will need at some point an anchor to reality, because every thing behaves the same way for a camera if you are using a real car, or if you are e.g. using a toy car in a reduced model of a city. The original author chose the Lidar by using the median of ground truth depth map, and this repo propose either this (which is not ideal for a system that tries to estimate depth without supervision) or the car's speed (ubiquitous and far more reasonable).
Thanks for your reply. That solves my questions!
Hi Clément,
Really great work! I have the following two questions.
Thanks!