HKUST-Aerial-Robotics / MVDepthNet

This repository provides PyTorch implementation for 3DV 2018 paper "MVDepthNet: real-time multiview depth estimation neural network"
GNU General Public License v3.0
309 stars 72 forks source link

Problem in generating point clouds from predicted Depth #7

Closed hmishra2250 closed 5 years ago

hmishra2250 commented 5 years ago

Thanks a lot for open-sourcing your project!

The most impressive thing, that catches most people is that the paper claims to be better than DeMoN. The heat map looks very good. However, when I try reconstructing the 3D point cloud from the depth, the result looks uneven, wavy and also it doesn't preserve the shape of the objects.

Shown below are the point clouds obtained from MVDepthNet and Demon.

MVDepth Pointcloud:

screenshot from 2018-09-27 19-08-47 screenshot from 2018-09-27 19-57-56

DeMoN pointcloud:

screenshot from 2018-09-27 20-00-03

As shown above, DeMoN point cloud preserves the object structure.

Is it the problem with the model, or there is something done wrongly by me. I can provide the corresponding images and pose if required for you to verify.

Also, the pretrained model provided along with the repo is the one trained with or without Geometric Augmentation.

Any suggestions/ideas are highly appreciated! :)

WANG-KX commented 5 years ago

Dear,

Thanks for your interest in the project. Would you please provide the image pair and the pose of the test? The model is trained with Geometric Augmentation.

In some cases, yes, the result is not as good as that of DeMoN. As our method estimates the depth using information from multiview observation. DeMoN, on the other hand, can estimate the depth with zero baseline meaning that it can use semantic information such as flat surfaces etc. Maybe it is the reason that the point cloud from DeMoN is visually better than ours, especially when it comes to small baselines.

Regards, Kaixuan

hmishra2250 commented 5 years ago

Hi Kaixuan,

Thanks for the super quick reply. Here are the images and the pose you asked for.

The thing is, I tried generating point clouds for the images in sample_data.pkl and I was getting similar bad results. I don't understand how can MVDepthNet be performing better than DeMoN and yet giving worse reconstructions. Is it some problem with the way I handle MVDepthNet or am I missing something?

WANG-KX commented 5 years ago

Dear,

The tested image you use is from TUM RGBD Dataset, and I am not sure if it is included in the training set of DeMoN. As I mentioned in the paper, images from TUM RGBD Dataset are split differently in MVDepthNet and in DeMoN (and it is the reason that we did not compare the results on TUM RGBD dataset in the paper). If you would like to compare, please make sure that it is not used to train both of the networks.

Here I provide the first ten images of SceneNN Dataset. It is not manually selected so that we are better than DeMoN. The top row of each result is left image, right image, RGBD camera depth map, the MVDepthNet generated depth map, error map. The second row of each result is RGBD camera depth map, the DeMoN generated depth map, its error map. delete

As I tested the DeMoN, it can generate very smooth point clouds. However, in some cases, the scale is not consistent with each pixel. It's ok to have wrong scale estimation of DeMoN, but the scale should be consistent. DeMoN is an impressive project that it also estimates camera poses. Given camera poses, I won't be surprised that MVDepthNet is better than DeMoN when evaluating the depth maps.

Regards, Kaixuan

hmishra2250 commented 5 years ago

Thanks for the information Kaixuan.

I think I get it now. Since we don't know the dataset split of TUM RGBD, we shouldn't rely on its evaluation.

And yes, the scale is one problem with DeMoN. All the point clouds are in their own scale. Let's say, I can solve the scale problem by some point cloud registration method. Then, what are your views on comparing just the quality of depth from DeMoN vs that from MVDepthNet?

Also, it would be a great help if you can share those 10 image pair, pose and camera intrinsics. (Downloading whole SceneNN dataset is hard, it's too large).

I thank you very much for helping with my queries.

Regards Himadri

WANG-KX commented 5 years ago

Dear,

You can download the bag at link. The data format is not the same as the data provided as the example. Please convert the data yourself.

The link will be valid for a week.

Regards, Kaixuan