Open ZhenyuSun-Walker opened 2 months ago
Hi, can you provide more information, like some visualizations
Sure, I'll send you the aweful result.
Can you also provide the input image?
And my input is the dataset of multiview images. Shown as below is the part of my input dataset And for total there are 20 images.
Thanks for your information. Did you run the following command to get novel views? If so, is the novel view equally bad?
python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root <your scene>
Actually, I did generate the novel view, and the metric of the novel view is excellent, as well as the visual quality when I observe the novel view images.
If you can get a good novel view, the corresponding estimated depth should also be good.
You can check the depth map in the folder <path to save ply>/<your scene>/depth
.
As for point clouds, you can zoom in to view it. Since this is an indoor scene, a large flat area below may be the floor. You need to zoom in to get a closer look at the point cloud.
If there is still a problem, then it may be a problem with filtering hyperparameters. Adjust here.
The depth map of the example/scene2 is like: The depth map of mine is like:
Can you explain the detail information about the filtering hyperparameters? Their meaning and the their impact on final quality when changing them
And I would like to verify an assumption of your pipeline.
So in your work flow, you combine the feature and get f_v, so actually after the generalizable model, there are only one target-view image rendered from lots of source-view images.
In this case, I wonder if the generated point_cloud by command python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root examples/scene1 save_ply True dir_ply <path to save ply>
is only from the last single target view image, or it is a combined point cloud with the 4 target view images? However, even if the pointcloud only comes from a single target-view rendered image, with the depth estimation of the taret-view in you pipeline, it should succeed in unprojecting the image to get the pointcloud, just like its performance on eamples/scene2.
Anticipating for your earliest reply!
1) The depth map you're showing doesn't seem particularly good, you can modify the number of sampling points volume_planes
in the config file, commonly used settings are [64,8], [48,8] and [16,8].
2) The process of point cloud fusion is to filter out some unreliable depths by checking the consistency of multiple views. For the Filter hyperparameter Settings
:
s = 1
dist_base = 1/8
rel_diff_base = 1/10
Refer here, dist_base
and rel_diff_base
are the thresholds for reprojection errors. If the reprojection error is less than the threshold, the depth is reliable. A larger threshold value indicates a more relaxed condition, and a smaller value indicates a more stringent condition. 's' means that it is reliable when at least s multiple views meet the conditions (It's not very precise, but it can be understood this way). A larger s
means a stricter condition, and a smaller s
means a looser condition.
It is possible that the current hyperparameter Settings are too strict for the scenario you are using, filtering out too many points, resulting in poor point cloud effect. You can make adjustments according to the meaning of the above hyperparameters. One extreme setting is
s = 1
dist_base = 100
rel_diff_base = 100
In this case, almost all of the points are considered reliable, i.e. no points are filtered out, and you can try it.
3) The generated point cloud is a combined point cloud with the 4 target-view images. The 4 point clouds corresponding to the 4 target views are fused into the final point cloud by the fusion.py
.
Thank you,I'll have a check ASAP! BTW, would you mind explaining what is the meaning of the volume-planes configuration, like [64, 8], [48, 8]?
You're welcome. We use a cascaded (two-stage) structure and the plane-sweeping algorithm for depth estimation. As shown in the figure below, given the near and far (far-near=R1) of the scene, we first define N1 depth planes (such as equal interval sampling), i.e., the pink lines. In the coarse stage (stage 1), based on these predefined depth hypothesis planes, we can get a coarse depth, i.e., the yellow line. In the fine stage (stage 2), we will further sample around the coarse depth obtained in the previous stage to obtain N2 depth planes. Based on this N2 depth hypothesis planes, we can predict a finer depth.
The volume-planes
are actually [N1, N2]
, which represents the number of depth samples in the two stages.
OK, that makes sense! And right now I finished the experiments with changing the dist_base and rel_diff_base into 100 respectively, however, the point cloud and the gaussian view are not good enough, shown as below.
(Top-down view)
(Front view)
And I sincerely hope that volume-plane configuration adjustment can work!
Sir, I find that the volume-plane configuration with [64, 8] and d r = 100 respectively, but the result is no good as I thought. Pretty tricky and strange!
Based on your images, I think that your photo is captured by panorama, which may not fit the camera model of pin hole used by paper. You can try pin hole image again!
Hello, Sir! I noticed that when I apply the generalizable methods on my own pictures, the generated point cloud is quite planar, why does the depth estimator works badly on my own dataset?