I tried it on my own dataset, but the depth maps look bad

jlartois commented 9 months ago

Thank you very much for providing this code. I was able to get it running pretty smoothly with the instructions provided. The fused.ply look fine, but the depth maps look bad.

For anyone curious of what the PatchMatchNet results look like on an example, real-world dataset, see this issue. I took 46 1920x1080 images around a bench. I have experience with MVS, so I take extreme care to fix any camera sensor parameters and minimize motion blur. The 46 images can be seen/downloaded here.

The colmap camera calibration looks fine: colmap_cameras

The colmap fused.ply confirms that the calibration is good: colmap_fused_ply

For completeness, instant-ngp (NeRF) also is able to correctly reconstruct the scene using the camera params from colmap: nerf_bench

These are the results of PatchMatchNet:

pair.txt looks perfect, so this is not an issue.

fused.ply looks okay in some places, others not so much: patchmatch_fused_ply

but the depth maps look far from state-of-the-art. Here is an example (some other depth maps are better, some worse, I choose a medium quality result here): 00000029

For completeness, here is the contents of the cam.txt file:

extrinsic
0.019894159882692455 -0.29141630256513085 0.956389440030488 -0.2667305720149542 
0.33078085888473663 0.9046271346140046 0.2687634102989685 -1.0743245232408203 
-0.9434978780039933 0.31100849814647163 0.11439173170574113 3.4207790958460342 
0.0 0.0 0.0 1.0 

intrinsic
1640.4268290361326 0.0 944.0 
0.0 1640.4268290361326 531.0 
0.0 0.0 1.0 

2.885458 39.362592

Is this what you would expect? What would you recommend to get better depth maps?

FangjinhuaWang commented 9 months ago

Hi. Based on my experience, I think the depth range is very important sometimes. I would suggest tuning the near, far vaules (e.g. 2.885458 39.362592 as you posted). First, I felt the near value may be too large, this may be the reason that the bottom part of the depth map is bad (they are very close to camera, maybe their depth is smaller than 2.88 and then the network cannot estimate meaning depth values). Second, the far value may be too large as well. You can manually crop it to a constant value. For debugging, you may also try other MVS methods on this same dataset, e.g. my another work IterMVS.

jlartois commented 9 months ago

Hi, thanks for the quick reply. I can see how the depth range is vital for implementations like this. I indeed also had to experiment with the depth range for other MVS networks. However, in this case, changing the depth range seems to not not help.

Here is a depth map when setting the depth min and max values for each cam.txt to [1.0, 20.0] (note, the depth map is darker, but this has to do with my choice of min and max when converting to a png, not with the "correctness" of the estimated depth): 00000029_1

TL;DR: the depth of everything but the bench remains as bad as with the original depth range.

For completeness, here is the corresponding confidence map: 00000029_1_prob

I think the confidence map illustrates how PatchMatchNet is struggling to find a good depth, except for the bench.

Btw, I also tried a depth range of [0.5, 10.0], but the depth maps were worse (meaning even further from the "ground truth", more noise, less geometric consistency).

cuisitan88 commented 9 months ago

For anyone curious of what the PatchMatchNet results look like on an example, real-world dataset, see this issue. I took 46 1920x1080 images around a bench. I have experience with MVS, so I take extreme care to fix any camera sensor parameters and minimize motion blur. The 46 images can be seen/downloaded here. Hi, I am confused about how masks are generated during the training of a customized dataset.@jlartois @FangjinhuaWang

FangjinhuaWang / PatchmatchNet

I tried it on my own dataset, but the depth maps look bad #88