Huangying-Zhan / Depth-VO-Feat

Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction
Other
349 stars 66 forks source link

doubts on the odometry #19

Closed FatEvilCat closed 5 years ago

FatEvilCat commented 5 years ago

Hello Zhan Here I have some problems.In your paper,you just use the cnn to get the two-view odometry. How to get the result like Figure.3? Does it use the tool on the https://vision.in.tum.de/data/datasets/rgbd-dataset/tools? But they use the txtytz qxqyqzqw as input. And I am the beginner of learning this area, so could you share codes to do these transformation,such as accumulate all timestamps from two-view odometry,the transformation of matrix,,quaternion?

thanks

Huangying-Zhan commented 5 years ago

Hi @FatEvilCat, we convert our predicted pose as transformation matrix and saved the matrix. To get the full trajectory, basically we want to get a sequence of T_0->0, T_1->0, T_2->0, .... To get the transformations, here is one example, e.g. T_2->0 = T_1->0 * T_2->1, where T_2->1 is the relative pose between two-view (our prediction). For each pose matrix, it represents the transformation from a timestamp to reference frame (e.g. first frame in our case). Also, the last column represents the postion of camera center and you can use it to plot the figure. Here is the code I used to accumulate the poses. https://github.com/Huangying-Zhan/Depth-VO-Feat/blob/master/tools/evaluation_tools.py#L168 Here is the code for plotting. https://github.com/Huangying-Zhan/Depth-VO-Feat/blob/master/tools/evaluation_tools.py#L510

Hope this help.

FatEvilCat commented 5 years ago

@Huangying-Zhan Thanks for your reply. It is so detailed that I can easily understand it. But the code for plotting is the same as the other link. And I find in your paper in Figure.3 , you also draw the path of Zhou.His result uses the 5 continuous images as input and his output is not the 4*4 matrix. Do you use the same way to visualize the path?

Huangying-Zhan commented 5 years ago

@FatEvilCat , thanks for pointing out the linking issue (copy and paste issue...). I have fixed the link. For the result of Zhou et al., similarly, I convert their predictions to be 4*4 matrix and accumulate the results. However, there is one more step in getting their full trajectory. In their prediction, there is scaling ambiguity. Therefore, I optimized the scaling factor for their predictions and accumulate the scaled result. For the optimization part, please check their released code: https://github.com/tinghuiz/SfMLearner/blob/master/kitti_eval/pose_evaluation_utils.py#L23

FatEvilCat commented 5 years ago

@Huangying-Zhan 👍 I have already read your paper, but here are some details I can't understand. In the section 3.3 and 4.3, what is the feature? In the section 4.3, you called it image descriptor. Is it the similar stuff in the orbslam to find the the relationships in the nearby image?

Huangying-Zhan commented 5 years ago

Hi @FatEvilCat , thank you for the interest in our work. Concerning the features, there are different features we have tried in our experiments, including features from the depth network, imagenet low-level features, features proposed in this paper, and the unsupervised version of that. All these features are dense features since we want to do feature warping. ORB feature is sparse feature and it doesn't fit our purpose.

FatEvilCat commented 5 years ago

@Huangying-Zhan So kind of you.Maybe it is just like the feature map to some extent? Here is another question.I just try to visualize the path, but I find that the path can't be so straight and thin like yours in the Figure.3. Is there any pre-process I should do? Here is the result of zhou Seq.09 screenshot from 2018-12-13 17-23-48

Huangying-Zhan commented 5 years ago

Hi @FatEvilCat , I guess that you concatenated all the snippets and plot the trajectories. However, one thing needs to be noticed is that the snippets are overlapped, which means, e.g. snippet-1: frame [0,1,2,3,4] snippet-2: frame [1,2,3,4,5] snippet-3: frame [2,3,4,5,6] ... I am not sure if that's the reason causing the result you got. If that's what you did, try to concatenate the pose for each frame not more than once.