how to use own data? - Githubissues

zhengmiao1996 commented 4 years ago

I don't have an idea of how to prepare my own data , not on kitti. Can you tell me how to use my own data with my camera

ClementPinard commented 4 years ago

You need to have

Pinhole like camera, ie without any distorsion
Calibration data.
Video sequences where the camera is actually moving (and not just tilting or panning), in a scene with lambertians surface (ie without reflections)

To get the first two, you can see how to do it here : https://docs.opencv.org/master/dc/dbb/tutorial_py_calibration.html You can also use COLMAP if you have a long video without anything moving in it : https://colmap.github.io/faq.html

As such, even if your camera is distorded at first, you can make the images rectilinar easily after having calibrated them.

Once all of this is done, you can put all the images of the same sequence in a folder (use ffmpeg), along with a text file with the intrinsics matrix (just a 3x3 matrice with only 4 non-zero values)

Finally, you need to create train.txt and val.txt files where folder (each containes a video sequence in jpeg pictures) is used either for training or validation (you can put everything into train if you want, or even put folders in both categories)

As to the fine details, You can have a look at the data preparation code here, hopefully with my indications, you can figure out where exactly to put everything.

Clément

ynjiun commented 3 years ago

Hi @ClementPinard

Very nice work! I have few questions regarding "Video sequences where the camera is actually moving (and not just tilting or panning),":

Can I have video sequences of having camera rotating along say Y axis (assuming camera looking out at Z axis)? is this rotation equivalent to "tilting or panning"?

If rotating along Y axis sequences is acceptable, then how fast or how slow should the camera rotate in say degree/sec (assuming my camera is capturing at 30 fps)?

If not acceptable, how about moving camera in Z axis? then how far and how fast/slow to make the difference? say moving the camera back and forth 1 mm in 7Hz, would this work? or the moving distance of 1 mm is too small to make any difference for training or inference?

Thank you very much for your insight.

ClementPinard commented 3 years ago

Rotation is not a probleme per se , it's just that there needs to be translation. In addition, the translation must be non negligible with respect to roation. I concede it's a bit subjective and requires some intuition, but the main idea is that the displacement of pixels must be different enough with each other thanks to the parallax so that the distribution of optical flow is spread enough. In short, the parallax must be visible ! Very simple example :
- Making a panorama (camera turninng, but not moving) : no ok
- Turning around an object (cmaera turning, but also moving around the object) : ok
- The whole optimization is agnostic regarding the axis. Rotation-only movement will not work for optimization. It's not a problem if you dataset has plenty of other moving frames, but it will add noise to your training nevertheless.
Is displacement enough ? It's all about the distance of the objects you see. For parallax to be visible, it needs to be at least 1 pixel, it often works better if its around 10px i would say, but it's not exact science. As we are speaking in terms of pixels, it then means that it depends on both your camera intrinsics and the distance of the objects. The higher the camera resolution, the lower the necessary moving distance, and the closer scene objects are, the lower the necessary moving distance.

As you can see, it's unfortunately very subjective and will require a lot of tests to get the training set just right. it appears that it was the case for KITTI dataset (and car videos in general) because movement is mostly forward, and very occasionally turning. Even then, the dataset was filtered to hav around 2meters between frames, for a depth distribution of 2meters to 80meters.

For frames 1mm apart, given a camera similar to KITTI in temrs of intrinsics, you can hope to have a converging optimization if the scene has objects at around 2mm to 10cm distance with respect to the camera. Now if your camera is 30Hz, you may ave lower displacement distance between two frames, and then you just apply the rule of 3 the same way I did : for a displacement of .1 mm, ideal distance is .2mm to 1cm and so on.

Keep in mind this is never about speed, but displacement between frames. That means that if your camera too slow, you might be able to increase the displacement between two frames by subsampling them (assuming the trajectory is rectilinear enough)

Hope all this was informative,

Clément

ClementPinard / SfmLearner-Pytorch

how to use own data? #108