Closed zhengmiao1996 closed 3 years ago
You need to have
To get the first two, you can see how to do it here : https://docs.opencv.org/master/dc/dbb/tutorial_py_calibration.html You can also use COLMAP if you have a long video without anything moving in it : https://colmap.github.io/faq.html
As such, even if your camera is distorded at first, you can make the images rectilinar easily after having calibrated them.
Once all of this is done, you can put all the images of the same sequence in a folder (use ffmpeg), along with a text file with the intrinsics matrix (just a 3x3 matrice with only 4 non-zero values)
Finally, you need to create train.txt and val.txt files where folder (each containes a video sequence in jpeg pictures) is used either for training or validation (you can put everything into train if you want, or even put folders in both categories)
As to the fine details, You can have a look at the data preparation code here, hopefully with my indications, you can figure out where exactly to put everything.
Clément
Hi @ClementPinard
Very nice work! I have few questions regarding "Video sequences where the camera is actually moving (and not just tilting or panning),":
Can I have video sequences of having camera rotating along say Y axis (assuming camera looking out at Z axis)? is this rotation equivalent to "tilting or panning"?
If rotating along Y axis sequences is acceptable, then how fast or how slow should the camera rotate in say degree/sec (assuming my camera is capturing at 30 fps)?
If not acceptable, how about moving camera in Z axis? then how far and how fast/slow to make the difference? say moving the camera back and forth 1 mm in 7Hz, would this work? or the moving distance of 1 mm is too small to make any difference for training or inference?
Thank you very much for your insight.
As you can see, it's unfortunately very subjective and will require a lot of tests to get the training set just right. it appears that it was the case for KITTI dataset (and car videos in general) because movement is mostly forward, and very occasionally turning. Even then, the dataset was filtered to hav around 2meters between frames, for a depth distribution of 2meters to 80meters.
For frames 1mm apart, given a camera similar to KITTI in temrs of intrinsics, you can hope to have a converging optimization if the scene has objects at around 2mm to 10cm distance with respect to the camera. Now if your camera is 30Hz, you may ave lower displacement distance between two frames, and then you just apply the rule of 3 the same way I did : for a displacement of .1 mm, ideal distance is .2mm to 1cm and so on.
Keep in mind this is never about speed, but displacement between frames. That means that if your camera too slow, you might be able to increase the displacement between two frames by subsampling them (assuming the trajectory is rectilinear enough)
Hope all this was informative,
Clément
I don't have an idea of how to prepare my own data , not on kitti. Can you tell me how to use my own data with my camera