dazinovic / neural-rgbd-surface-reconstruction

Official implementation of the CVPR 2022 Paper "Neural RGB-D Surface Reconstruction"
https://dazinovic.github.io/neural-rgbd-surface-reconstruction/
589 stars 59 forks source link

How about the result on real world datasets? #5

Closed endlesswho closed 2 years ago

endlesswho commented 2 years ago

@dazinovic How about the result on real-world datasets. I collect the datasets with my own RGBD camera and estimate the pose with colmap. But the results is a mass. Any advices?

dazinovic commented 2 years ago

I think you'll have to be a bit more specific about the problem. Do you have a problem with COLMAP or does my method give you bad results when you use it with the COLMAP poses? If it's the latter, make sure the cameras are in the correct coordinate system and that you are correctly loading the depth maps. You will have to either save the data in the described format or write your own dataset loader. Since you have depth, you can also try aligning the images with BundleFusion.

endlesswho commented 2 years ago

e depth maps. You will have to either save the data in the described format or write your own dataset loader. Since you have depth, you can also try aligning the images

The poses are estimated by colmap. The depth image are normalized to 0-1 for training. I wondered whether I need to reconstruct my scene my rgbd reconstruction and get the sc_factor and translation correct for the network?

dazinovic commented 2 years ago

The depth images need to be in metric space.

rancheng commented 2 years ago

@endlesswho is the problem solved? could you please post your result here?

endlesswho commented 2 years ago

@endlesswho is the problem solved? could you please post your result here?

I'm so sad the problem still remains. The depth images in metric space, but the output pose of colmap is a scaled value. I think a rgbd reconstruction method would work!

dazinovic commented 2 years ago

You can use some flavor of KinectFusion to to obtain camera poses. If you want to use the COLMAP poses with your depth sensor's measurements, you will need to scale the translation vectors of your camera poses.

endlesswho commented 2 years ago

@rancheng My problem was solved with a rgbd reconstruction with icp matching and get the trajectory. However, the reconstruction results with @dazinovic 's method seems no so good. I also run the result with breakfast_room. With a disturb of trajectory, the result was shown bellow: image

dazinovic commented 2 years ago

It looks like your camera extrinsics are in the wrong coordinate system. My method uses the OpenGL convention (same as NeRF). Maybe one of these issues can help you: https://github.com/dazinovic/neural-rgbd-surface-reconstruction/issues/4 https://github.com/dazinovic/neural-rgbd-surface-reconstruction/issues/2

endlesswho commented 2 years ago

convention

Reasonable! I'll have a try and pose my new results.

endlesswho commented 2 years ago

It looks like your camera extrinsics are in the wrong coordinate system. My method uses the OpenGL convention (same as NeRF). Maybe one of these issues can help you: #4 #2

My camea extrinsics are in wrong coordinate system. I transform my coordinate system to OpenGL convention, the results are better. However, what if I don't know the bounding of the scene, any suggestions to solve this question?

dazinovic commented 2 years ago

You can approximate it with your camera positions.

endlesswho commented 2 years ago

You can approximate it with your camera positions.

My result is all right with the help of your advices. Thanks for your kindly reply.

JyotiLuxolis commented 2 years ago

Hello @endlesswho @dazinovic, can you briefly describe what changes you made to get it to work with real world datasets? Is it something as follows?:

  1. Generate Poses using Colmap --> 2. Normalize Depth maps to 0-1 --> 3. Transform poses as described in #2 --> 3. Running the training procedure

Also a few more questions @endlesswho:

  1. What dataset did you use?
  2. What is meant by "keeping depth images in metric space"?
dazinovic commented 2 years ago

Hello @endlesswho @dazinovic, can you briefly describe what changes you made to get it to work with real world datasets? Is it something as follows?:

1. Generate Poses using Colmap --> 2. Normalize Depth maps to 0-1 --> 3. Transform poses as described in [How to Transform ScanNet Poses? #2](https://github.com/dazinovic/neural-rgbd-surface-reconstruction/issues/2) --> 3. Running the training procedure

I generated poses using BundleFusion (although, you can also use Colmap for this) and then applied the transformed as described in the linked issue. The depth maps are not normalized. The values need to be in meters. Scannet depth maps are in millimeters, so you simply need to divide by 1000. The method will work with other scales too, but the depth maps need to be consistent with the camera poses.

junshengzhou commented 2 years ago

Hello, I try to reproducing Neural-RGBD with the data used by manhattan SDF and NICE-SLAM (e.g. Replica). I find that some views are optimized correctly (the rendered depths and images seems to be correct) while most of the views are optimized wrongly. I simply take the fx and fy in the intrinsic as the focal, and I don't know how to do with the cx and cy in the intrinsic. And it seems that extrinsics are similar with your provided data, and I also transform the depth to be in meters.

Do you have any ideas? Thanks! @dazinovic @endlesswho

ZuoJiaxing commented 2 years ago

I have encounter the same issue with @junshengzhou , could you reply to us? @dazinovic @endlesswho Normally, fx, fy , cx, cy are provoded, but it seems that you only need a single value of focal length. How to deal with others? What is the focal length value I should use given the fx, fy, cx, cy?