avisingh599 / mono-vo

An OpenCV based implementation of Monocular Visual Odometry
MIT License
780 stars 294 forks source link

Some errors when running with KITTY dataset #14

Open BorisLerouxFox opened 6 years ago

BorisLerouxFox commented 6 years ago

Dear Avi Singh, Thanks a lot for this code, it'a a great help for monocular odometry. Could you indicate precisely which KITTY dataset do you use in your example ?

First question in the image reading (l86, l87 and l158), you use 6 digits for the filename, but some of the datasets use 10 digits for the filename, do you know a way to read this 10 digits properly, "*10d" is not working. I have solve this issue by renaming the image filenames.

Then I run several errors when runing the code with several KITTY dataset:

Do you know how from when the errors can come ? I'm using Ubuntu 14.04 and OpenCV 3.4.3 with the KITTY dataset 2011_09_26_drive_0001 Computer : Intel® Core™ i7-4712MQ CPU @ 2.30GHz × 8 RAM : 16Go

Best regards Boris

BorisLerouxFox commented 6 years ago

Ok, I go a little bit further on my research, the second error seems to come from the FAST algortihm, I changed the fast_treshold from 20 to 10, and it works until the end of the sequence. Actually, I didn't find any documentation about this fast_treshold and what it represent. Is it the number of points ?

Now I'm running an error about the scale, but I know to solve it, need to convert geographic coordinates into cartesian ones, do I'm right ? Best regards Boris

BJMH commented 5 years ago

Hi Boris, The KITTI dataset that is being used appears to be the colour one from here (KITTI website). It is sequence 00. You also need the 00.txt which is in a different folder.

For your first error it looks like the colour images are not being loaded at all. That is most likely due to a misspelling of the filename. The second error is because you have a grey image but you are trying to convert it from colour to grey (it's already is grey). If you want to use grey scale images, you will need to remove all the lines that call cvtColor(..., COLOR_BGR2GRAY);

According to the OpenCV documentation, the threshold parameter is a "threshold on difference between intensity of the central pixel and pixels of a circle around this pixel." I'm not really sure if that's a minimum threshold or a maximum threshold, but I do know that FAST features work best around areas with sharp and distinctive changes in intensity. That is good gradients around corners.

Scale errors come from the fact the monocular VO cannot determine scene scale. A scene that is twice as large and twice as far away will look the same to a single camera, so this code uses information from the 00.txt ground truth data file to change the scale of estimates. If you are running this and not getting correct scale, you might be using a file in a different format. Try to figure out which parts of each line in your ground truth data relate to the x, y, and z translations, then change getAbsoluteScale(...) to use that instead.

JonnySme commented 5 years ago

@BJMH Hello. Can you explain to me how to use your own dataset?

BJMH commented 5 years ago

@JonnySme If your dataset is a sequence of image files then you need to change lines 85, 86, and 139 to load your images instead. Also you need to change getAbsoluteScale() to load your ground truth data. It currently expects 12 numbers per line with the 4th, 8th, and 12th being the x, y, and z translation. You can remove the call to that function if you don't have a ground truth, but then each frame-to-frame translation will be estimated as 1 unit long.

JonnySme commented 5 years ago

@BJMH Thank you for your answer! how can i get ground truth data myself? If ground truth data is missing, then the end result will be very different from the truth?

i tried to remove "scale = getAbsoluteScale(numFrame, 0, t.at(2));" function, no xyz builds after deletion

Thank you for your answer!

BJMH commented 5 years ago

@JonnySme It depends on your dataset. If it's video that you recorded yourself then you will have to measure it. If you downloaded a dataset it should come with its own GT.

This type of VO cannot determine the magnitude of any translations, nor does it have a consistent scale between measurements. If you think about it, any object that the camera sees would appear the same in the video if it were twice as large and twice as far away. Because of that every frame-to-frame translation is normalised so you need to scale it by the correct magnitude from a GT.

Without the GT, your rotations and the direction of translation are still correct. Only the magnitude of the translations are incorrect, and also inconsistent. To made them consistent you would need to add in some extra alignment across multiple frames with Structure from Motion and/or Bundle Adjustment. Both of those are pretty deep rabbit holes, and I don't know of any easy to digest code examples floating around online.

JonnySme commented 5 years ago

@BJMH Thank you! I have a set of frames taken on a camera from 22 frames per second. How can I get a GT on this dataset?

BJMH commented 5 years ago

@JonnySme If it's video you've taken yourself you'll have to measure it while taking the footage. Perhaps with GPS, an IMU, Vive Trackers, having calibration checkerboard present in all frames, or some other method.