anuranbaka / OpenDTAM

An open source implementation of DTAM
Other
286 stars 150 forks source link

dtam is producing poor reconstruction on ptam poses #21

Open avanindra opened 9 years ago

avanindra commented 9 years ago

Hi Paul,

I have been trying to integrate dtam with live ptam frames. I have used 2.4.9 experimental branch. I am noticing that dtam is producing poor reconstruction. The reconstruction is quite smooth , but depths are not correct. I apply opencv stereoBM or SGBM on same ptam keyframes , with same poses , I get much better depth maps and reconstruction. I have tried 32 and 64 depth layers , with 30 image per cost volume. I had set near and far values from distance of sparse points of ptam from camera.

I can upload the running code with dataset , if you want to have a look at it.

anuranbaka commented 9 years ago

Please do, I won't have time to look at it closely until tomorrow, but I'll try to see if I see anything obvious today and look harder tomorrow. Also, you should look at commit 638f119a on the experimental branch. It shows spinning views of something very close to the highest quality reconstruction that is possible with the current DTAM implementation. If yours is much worse, you may have a problem in how the dataset was captured or preprocessed, such as not turning off auto exposure, not undistorting the image, or noise higher than the outlier tolerance set in the cost volume construction.

avanindra commented 9 years ago

I will check the commit you mentioned. The camera I used for my data is "fujifilm finepix 3d" camera , which may not be appropriate for dtam, but what I was expecting is that dtam should produce better result than StereoBM. In a couple of days I would get a grey flea global shutter color camera , I ordered last week , I guess that would be ideal for testing the algorithm , as author themselves used grey flea 2 camera.

For distortion , I have used PTAM distortion model , one parameter barrel distortion , and I undistort the images before sending into DTAM pipeline.

I have checked in the codes in "patm_poses" branch of a fork of 2.4.9 experimental branch. https://github.com/avanindra/OpenDTAM/tree/ptam_poses

I have written the 3d viewer in glfw , I have included the code and libraries as well as ptam libraries , hopefully , you would get no linking error in the first build.

once you run the program , you would have to press 'space bar' twice in some intervals , then the pose would be initialized , afterwards press "D" , to start DTAM reconstruction .

Press 'S' for sparse points display , "P" for dtam reconstruction display , "O" for opencv stereoBM reconstructions display, "I" for frame display , "G" for opencv stereo depthmap.

Following is the link of video sequence ( github was not letting me upload it , so I put it in dropbox. )

https://www.dropbox.com/s/9eyf851qh0k2ifw/DSCF0159.AVI?dl=0

put the video in "Cpp/dataset" directory , for that is the path I have passed from cmake.

[ Also , one more point , even though I have put glfw and ptam codes in separate threads , they all are consuming only one cpu , so pose initialization process may take 10 to 15 seconds sometimes. ]

Thank you for looking into it.

hustcalm commented 9 years ago

@avanindra

Hi there, trying to get your branch "ptam_poses" running up, however got the errors below:

make[2]: *** No rule to make target `../link_libraries/glfw.so', needed by `opendtamdemo'.  Stop.
make[1]: *** [CMakeFiles/opendtamdemo.dir/all] Error 2
make: *** [all] Error 2

Any idea to get this fixed?

Thanks in advance:-)

anuranbaka commented 9 years ago

@hustcalm

You will need to get a copy of glfw3 and edit the cmakelists file to point to it. I used the version on http://www.opengl-tutorial.org/ since I already had it available. You should also make the changes below.

@avanindra I finally figured it out. There are two problems:

  1. The colors were out of range, you need udImage.convertTo( image , CV_32FC3 ,1.0/255.0); instead of udImage.convertTo( image , CV_32FC3 ,1.0/65535.0); in opendtamdemo.cpp
  2. The computeAndDisplayPoints(...) function is using the wrong image it shoud use:

       Mat base;
       cost.baseImage.download(base);
       float* colorData = (float*)base.data;

    and

    colors.push_back( Eigen::Vector3f( colorData[ 2 ]  , colorData[ 1 ]  , colorData[ 0 ] ) );

    instead of

    uchar *colorData = colorFrame.data;

    and

    colors.push_back( Eigen::Vector3f( colorData[ 2 ] /255.0 , colorData[ 1 ]/255.0  , colorData[ 0 ]/255.0 ) );

Sorry I haven't had time to make a proper pull req, my version of your code has diverged a lot in trying to find the problem. I'm trying to get to that.

hustcalm commented 9 years ago

@avanindra

Got the problem fixed by replacing the last line of the CMakeLists.txt by:

target_link_libraries( opendtamdemo  OpenDTAM ${OpenCV_LIBS} ${Boost_LIBRARIES} ${QT_LIBRARIES} visualization3d GLEW cvd ptam the_absolute_path_to_libglfw.so )

Hope it helps:-)

hustcalm commented 9 years ago

@anuranbaka

Thanks, already got the problem fixed and the code compiles and runs.

For the initialization part, I'm sort of using the PTAM sparse initialization as @avanindra did.

BTW, I found Newcombe's PhD thesis is valuable for reference, maybe you guys should also be interested in.

www.doc.ic.ac.uk/~ajd/Publications/newcombe_phd2012.pdf

avanindra commented 9 years ago

@anuranbaka

Thanks for the reply. I changed the color range as you mentioned. It did improve the reconstruction somewhat. Though I am still not getting accurate reconstruction.

Also , I wanted one clarification regarding cost layer of volume. From the code , it seems like you are assuming each cost layer to be planer , as you are assigning same z value to each pixel at a particular cost layer , while I think the cost layer should be spherical , each pixel at one layer equal distant from the reference camera.

@hustcalm

Sorry , I couldn't respond bit early , I guess I missed reading the github notifications. I somehow missed uploading the glfw libraries in link_library folder. Though , I am glad that you got the code fixed and running.

anuranbaka commented 9 years ago

@avanindra Yes, you're right, the cost volume is divided into planes instead of spheres. This makes the cost calculations much easier. Also, it makes planes tend to remain planar rather than curved, since the regularization tends to pull toward the shape of the cost volume.

For a parallel stereo pair, the planar shape with inverse depth parametrization is optimal in the sense that the sampling distribution matches the error distribution. For multiple nonparallel views the optimal sampling shape is intractable, but should be close to both planar and spherical forms, so I just use the easier planar one. Of course none of this applies to things like ultra fisheye lenses where the pixel density is not constant in the pinhole model.

Anyway, my experience was that the quality of reconstruction had a lot to do with the quality of the ptam tracking, which was finniky to get started well. In particular ptam likes to sample all of its points from a plane, which messes things up. In your sample video, I wait until the bear's nose comes into view and the camera starts to pan downward, which gives ptam a number of points on both floor and desk.

Also, I think I turned off a line of code in the CostVolume.cu that said something like del=fminf(del,.005)*1/.005f; to make it not as sensitive to camera auto exposure.

You might want to force the far plane to be at infinity (i.e. 0). Ptam seems to sometimes choose bad near and far planes. I haven't figured out a good heuristic for setting the near plane.

avanindra commented 9 years ago

@anuranbaka

Hi , a bit late reply from me. I had a point when I said the cost layer should be spherical . It's importance lies in computing the depth derivatives; in which you assume the inverse depth step to be constant , which would be possible only when cost layers are spherical. In case of planer cost layers , the corners would have different inverse depth jump than the middle pixel. Correct me if I am wrong.

anuranbaka commented 8 years ago

@avanindra https://github.com/avanindra Well, that's right if the depth you're using is the literal distance from the camera's entrance pupil/virtual pinhole, but for most cameras that is not the natural depth measure. If you consider the way that a pinhole camera projects onto a planar sensor, you see that: x_sensor=x_world_f/z_world y_sensor=y_world_f/z_world z_sensor=z_world*f/z_world=f <--This line is usually ignored because it is constant if the camera is facing down the z axis of the world. Notice that these equations are linear in x_world, y_world, and 1/z_world. When we refer to depth, we mean z_world, not the distance from the pinhole. By using 1/z_world, the "inverse depth", all of our equations are linear, and derivatives are linear. The depthStep in the code is actually a units of inverse depth, as are all other depth measures. Sorry about that, it's a fairly common metonym. In any case, it also means that the cost volume voxels are actually not cubes in real world space, but frusta with two planar faces and slightly curved sides.

The depth measure you are proposing would be appropriate if the sensor was spherical, but practically all cameras have a planar sensor.

On the other hand, extreme fisheye cameras are often designed to approximate the ATAN model, which is what PTAM uses. The ATAN model makes the sensor act sort of like it is spherical, but the math is really bad. In theory, DTAM would need some corrections to deal with that model directly. As is, we just undistort the image to make it look like it came from an ideal planar pinhole camera.

-Paul

On Thu, Jul 2, 2015 at 4:38 AM, avanindra singh notifications@github.com wrote:

@anuranbaka https://github.com/anuranbaka

Hi , a bit late reply from me. I had a point when I said the cost layer should be spherical . It's importance lies in computing the depth derivatives; in which you assume the inverse depth step to be constant , which would be possible only when cost layers are spherical. In case of planer cost layers , the corners would have different inverse depth jump than the middle pixel. Correct me if I am wrong.

— Reply to this email directly or view it on GitHub https://github.com/anuranbaka/OpenDTAM/issues/21#issuecomment-117958791.