TQTQliu / MVSGaussian

[ECCV 2024] MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
https://mvsgaussian.github.io/
MIT License
404 stars 21 forks source link

About the .ply file from the generalizable model #42

Open ZhenyuSun-Walker opened 2 months ago

ZhenyuSun-Walker commented 2 months ago

Hello, Sir! I noticed that when I apply the generalizable methods on my own pictures, the generated point cloud is quite planar, why does the depth estimator works badly on my own dataset?

TQTQliu commented 2 months ago

Hi, can you provide more information, like some visualizations

ZhenyuSun-Walker commented 2 months ago

Sure, I'll send you the aweful result. image image

TQTQliu commented 2 months ago

Can you also provide the input image?

ZhenyuSun-Walker commented 2 months ago

And my input is the dataset of multiview images. Shown as below is the part of my input dataset 90 95 100 And for total there are 20 images.

TQTQliu commented 2 months ago

Thanks for your information. Did you run the following command to get novel views? If so, is the novel view equally bad?

python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root <your scene>
ZhenyuSun-Walker commented 2 months ago

Actually, I did generate the novel view, and the metric of the novel view is excellent, as well as the visual quality when I observe the novel view images.

TQTQliu commented 2 months ago

If you can get a good novel view, the corresponding estimated depth should also be good. You can check the depth map in the folder <path to save ply>/<your scene>/depth. As for point clouds, you can zoom in to view it. Since this is an indoor scene, a large flat area below may be the floor. You need to zoom in to get a closer look at the point cloud. If there is still a problem, then it may be a problem with filtering hyperparameters. Adjust here.

ZhenyuSun-Walker commented 2 months ago

The depth map of the example/scene2 is like: image The depth map of mine is like: image

Can you explain the detail information about the filtering hyperparameters? Their meaning and the their impact on final quality when changing them

ZhenyuSun-Walker commented 2 months ago

And I would like to verify an assumption of your pipeline. image So in your work flow, you combine the feature and get f_v, so actually after the generalizable model, there are only one target-view image rendered from lots of source-view images. In this case, I wonder if the generated point_cloud by command python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root examples/scene1 save_ply True dir_ply <path to save ply> is only from the last single target view image, or it is a combined point cloud with the 4 target view images? However, even if the pointcloud only comes from a single target-view rendered image, with the depth estimation of the taret-view in you pipeline, it should succeed in unprojecting the image to get the pointcloud, just like its performance on eamples/scene2.

Anticipating for your earliest reply!

TQTQliu commented 2 months ago

1) The depth map you're showing doesn't seem particularly good, you can modify the number of sampling points volume_planes in the config file, commonly used settings are [64,8], [48,8] and [16,8]. 2) The process of point cloud fusion is to filter out some unreliable depths by checking the consistency of multiple views. For the Filter hyperparameter Settings:

s = 1
dist_base = 1/8
rel_diff_base = 1/10

Refer here, dist_base and rel_diff_base are the thresholds for reprojection errors. If the reprojection error is less than the threshold, the depth is reliable. A larger threshold value indicates a more relaxed condition, and a smaller value indicates a more stringent condition. 's' means that it is reliable when at least s multiple views meet the conditions (It's not very precise, but it can be understood this way). A larger s means a stricter condition, and a smaller s means a looser condition.

It is possible that the current hyperparameter Settings are too strict for the scenario you are using, filtering out too many points, resulting in poor point cloud effect. You can make adjustments according to the meaning of the above hyperparameters. One extreme setting is

s = 1
dist_base = 100
rel_diff_base = 100

In this case, almost all of the points are considered reliable, i.e. no points are filtered out, and you can try it.

3) The generated point cloud is a combined point cloud with the 4 target-view images. The 4 point clouds corresponding to the 4 target views are fused into the final point cloud by the fusion.py.

ZhenyuSun-Walker commented 2 months ago

Thank you,I'll have a check ASAP! BTW, would you mind explaining what is the meaning of the volume-planes configuration, like [64, 8], [48, 8]?

TQTQliu commented 2 months ago

You're welcome. We use a cascaded (two-stage) structure and the plane-sweeping algorithm for depth estimation. As shown in the figure below, given the near and far (far-near=R1) of the scene, we first define N1 depth planes (such as equal interval sampling), i.e., the pink lines. In the coarse stage (stage 1), based on these predefined depth hypothesis planes, we can get a coarse depth, i.e., the yellow line. In the fine stage (stage 2), we will further sample around the coarse depth obtained in the previous stage to obtain N2 depth planes. Based on this N2 depth hypothesis planes, we can predict a finer depth.

The volume-planes are actually [N1, N2], which represents the number of depth samples in the two stages.

image

ZhenyuSun-Walker commented 2 months ago

OK, that makes sense! And right now I finished the experiments with changing the dist_base and rel_diff_base into 100 respectively, however, the point cloud and the gaussian view are not good enough, shown as below.

(Top-down view) image

(Front view) image

77bd0f977419df4a4b0746ac78c9ad9

And I sincerely hope that volume-plane configuration adjustment can work!

ZhenyuSun-Walker commented 2 months ago

Sir, I find that the volume-plane configuration with [64, 8] and d r = 100 respectively, but the result is no good as I thought. Pretty tricky and strange!

zhangshuoneu commented 3 weeks ago

Based on your images, I think that your photo is captured by panorama, which may not fit the camera model of pin hole used by paper. You can try pin hole image again!