Closed Smallha61109 closed 5 years ago
Hello, Thanks for your interest in our work.
Thank you very much for your reply!
Hello, I have two more questions here:
Thank you very much.
Dear,
1, The output depth is the inverse depth. Inverse the output you will get a depth map in metric scale. 2, Be careful when using ORB-SLAM because the relative pose here requires metric scale. Monocular mode of ORB-SLAM cannot get the real world scale. In the example data, the camera pose is given by the dataset. TUM is good to use. Later this week, I will upload an example to use online pose estimator such as VINS-Mono to get real time depth estimation. I think if you are doing things right you can get reasonable results.
Regards,
Kaixuan
Hello,
Sorry but I don't quite understand what is depth map in metric scale? If I use the example model given with the example code, what will the scale be?
Thank you for your quick reply.
Dear,
Assume for pixel x we get the estimation z. The depth of pixel x is d=1/z, the unit is meter.
Kaixuan
Hello,
Just want to make sure that in the example code, the output I should be working on is "idepth" right? If i want to save a non-inverse depth map using mm as unit, I should be doing:
save_depth = (1/idepth)*1000
Dear,
I think it's the right way to do.
Regards, Kaixuan
Hello, Since the output is checked, I've went back to check my input, but can't see any problem. I tried to use TUM fr1_xyz dataset as input, while I take first frame as reference image, and 4th frame as measurement image, their ground truth pose will be: First frame: 1.3405 0.6266 1.6575 0.6574 0.6126 -0.2949 -0.3248 4th frame: 1.3066 0.6256 1.6196 0.6621 0.6205 -0.2892 -0.3050 in quaternion.
Then it is transformed to pose matrix: First frame: [[ 0.07551046 0.61387944 -0.78567948 1.3405 ] [ 0.99701352 -0.03828154 0.06573556 0.6266 ] [ 0.01021044 -0.78835852 -0.61490704 1.6575 ] [ 0. 0. 0. 1. ]] 4th frame: [[ 0.06268622 0.6452541 -0.76146364 1.3066 ] [ 0.9980781 -0.0440261 0.0449838 0.6256 ] [-0.00445364 -0.7627782 -0.64679332 1.6196 ] [ 0. 0. 0. 1. ]]
The 'left2right' will be pose4.inv()*pose1, therefore: [[ 4.74018205e-03 6.12628855e-01 3.53346909e-03 -9.37094866e-01] [ 6.43286514e-01 1.68159404e-03 -5.01337987e-02 2.62950782e-01] [-7.77369044e-03 -3.54228786e-02 3.97621668e-01 3.33813858e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
Does this procedure seems correct to you?Cause the output doesn't look as good as your sample and so is the sc-inv, so I'm not sure what is the problem. Thank you very much for your patient explanation.
Dear,
Actually, the f1_xyz belongs to the training set of the network. I checked the dataset, why the pose of the 4th frame (stamp 1305031102.275326) is not 1305031102.2758 1.3160 0.6254 1.6302 0.6609 0.6199 -0.2893 -0.3086
? The pose given by you is 1305031102.3158 1.3066 0.6256 1.6196 0.6621 0.6205 -0.2892 -0.3050
right?
Also, have you normalized the image before the input?
Regards, Kaixuan
Hello, Thank you for pointing out the issue of timestamp association, it is due to a bug in my association code. However, after fixing the issue it doesn't seem to have much effect on the output. The input is normalized using your suggestion in the README, each rgb image will be subtract by 81 and then divide by 35.
Dear,
I uploaded an examply2.py
to show how to process your own data. As far as I see, the two images are too close to each other that the translation is not enough.
Regards, Kaixuan
Hello, you have a great project!
I have a few question on using it:
Thank you very much!