YoYo000 / MVSNet

MVSNet (ECCV2018) & R-MVSNet (CVPR2019)
MIT License
1.39k stars 303 forks source link

Question about ground-truth depth maps and the evaluation #50

Open seul0724 opened 5 years ago

seul0724 commented 5 years ago

Hello! Thanks a lot for your work!

I have tried to train and test the network. It works fine but I still have two questions: (1)For training, I wonder how to get the ground-truth depth maps? You have provided the depth maps and given a brief description in the paper, but it is difficult for people new to the field to realize that. Can you provide more instructions on generating ground truth depth maps?Or is there any software to do that? (2)For testing, I have tried to compare the performance with Colmap. However, Colmap does not rely on calibrated camera parameters and seems to use a different coordinate system. Since the evaluation criteria of accuracy and completeness calculate the distances between point clouds, the distances calculated may be very high even though the point clouds are similar. I wonder how to conduct fair comparisons as you did in the paper?

Hope you can give some instructions. Thanks a lot.

YoYo000 commented 5 years ago

Depth maps are rendered from ground truth meshes, which is generated from the DTU provided ground truth point could using the screened Poisson surface reconstruction (SPSR). SPSR parameters are also provided in Sec. 4.1. Training - Data Preparation. So if you want to generate the depths maps yourself, you need to use SPSR to generate the GT meshes, and then write a depth render program to generate the GT depth maps.

If you want to compare COLMAP with MVSNet on DTU dataset, make sure to use the ground truth cameras rather than the colmap SfM cameras. You may follow COLMAP's FAQ to generate the ground truth cameras in COLMAP format and perform the later dense reconstruction.

TruongKhang commented 4 years ago

@YoYo000 , did you use SPSR to generate mesh from point cloud in Tank and Temple (T2) datasets? I have some problems when using this program with some of T2 datasets.

tatsy commented 4 years ago

@YoYo000 I am sorry to bother you so many times.

I am still struggling to reproduce your results on my own and found that my preparation of ground truth depth maps seems to be wrong.

I prepared the ground truth meshes with SPSR by following your paper. Then, I render the depth map with my own C++/OpenGL program. In this program, I used the extrinsic camera parameters directly as an OpenGL's view matrix, and converted the intrinsic camera parameters to an OpenGL's projection matrix following the following Gist. https://gist.github.com/astraw/1341472#file-calib_test_utils-py-L80

To validate my ground truth depth maps, I tested to reconstruct the mesh using the ground truth depth maps. Then, I found the reconstructed point clouds are located slightly above the original ground truth mesh (see the following figure. blue: your R-MVSNet result, red: my recon. from ground truth depth maps).

I downloaded your post-processed point cloud generated by R-MVSNet and checked that your data is at the same location of the ground truth mesh that I am using (Does this post-process include the output point cloud alignment, e.g., using ICP?)

GT vs R-MVSNet (yours) GT vs my recon. from GT depth maps
snapshot00 snapshot01

Do you have any idea to fix my problem? Otherwise, I'd appreciate if you would elaborate how you generated the ground truth depth maps.

Thank you very much for your help.

YoYo000 commented 4 years ago

@tatsy The depth map is generated b direct mesh rendering without external alignment.

YoYo000 commented 4 years ago

@tatsy The depth map is generated b direct mesh rendering without external alignment.

As the 3D point cloud is slightly misaligned after projection/unprojection, have you check the -+0.5 issue when converting the pixel coordinate to the texture coordinate? It might cause the misalignment problem from my experience.

tatsy commented 4 years ago

@YoYo000 Thank you very much for your advice. Actually, the misalignment in my result seems to be a translation at a fixed distance and is very tiny. Therefore, I am also thinking that the projection can be a problem.

I think OpenGL's rasterizer picks the pixel center to evaluate the pixel value. So, for the depth map, the depth value at the pixel center will be evaluated. I also understand it can be controlled by gl_SamplePosition (by default it is (0.5, 0.5)).

Can I ask if you changed the sample location within the pixel by using gl_SamplePosition or something like that? Or, did you calculate the depth maps using your own rasterizer to pick up a specific location inside the pixel?

tatsy commented 4 years ago

According to your comment in the following link, should I pick the top-left corner to evaluate the depth value of a pixel (for example, by specifying gl_SamplePosition = vec2(0.0, 0.0))?

https://github.com/YoYo000/MVSNet/issues/38#issuecomment-478422515

Hi,

In multi-view geometry we use image coordinate rather than pixel coordinate, and we take the top-left corner of the image as (0, 0).

You can imagine each pixel is with width x height of 1 x 1, and then in image coordinate the center of the top-left pixel is (0.5, 0.5)

YoYo000 commented 4 years ago

@tatsy did you go through the rendering part? I did not go into details on the code you post yet.