Tanks and Temples Setup

tejaskhot commented 5 years ago

Thanks for sharing the code.

I am trying to reproduce the results on tanks and temples with the pre-trained model but not succeeding so far. An example camera file looks like:

extrinsic
0.333487 -0.0576322 -0.940992 -0.0320506
0.0582181 -0.994966 0.0815704 -0.0245921
-0.940956 -0.0819853 -0.328452 0.248608
0.0 0.0 0.0 1.0

intrinsic
1165.71 0 962.81
0 1165.71 541.723
0 0 1

0.193887 0.00406869 778 3.35933

I have parsed the file to adjust the depth min and max, but it doesn't seem to help much. I only have a 12GB GPU memory, so I am running at half the image resolution which shouldn't hurt a lot. However, the outputs I am getting are pretty bad and nothing like the paper. Moreover, I find that I am having to change the parameters for every single scan (horse, family, etc) separately and no set of values seems to apply to all.

@YoYo000 Since there are multiple similar questions on this, it'd be great if you could please summarize the detailed steps for reproducing the scan results including the parameters to use and changes to the repository scripts if any.

YoYo000 commented 5 years ago

I will generate the original cam.txt files for Tanks and Temples dataset soon. The depth min and max of the above camera might be a little bit too relaxed.

Meanwhile if you are using the resized images, some post-proc params might need to be further tuned. I will try to find a proper config and provide a simple script.

tejaskhot commented 5 years ago

Thanks a lot @YoYo000 ! Do you have an approx timeline for this? By when do you think it'll be possible to share?

YoYo000 commented 5 years ago

@tejaskhot Hopefully today or tomorrow

YoYo000 commented 5 years ago

@tejaskhot you could use this cams and try the new commit for Family dataset:

python test.py --dense_folder '/data/intel_dataset/intermediate/mvsnet_input/family/' --max_d 256 --max_w 960 --max_h 512 --interval_scale 1

python depthfusion.py --dense_folder '/data/intel_dataset/intermediate/mvsnet_input/family' --prob_threshold 0.6 --disp_thresh 0.25 --num_consistent 4

If you want to tune the point cloud I think you could change --prob_threshold. Also, I found that the downsized setting and the Fusibile post-processing would affect the reconstruction:

Downsized image + Fusibile post-proc:

Downsized image + Proposed post-proc:

Original image + Proposed post-proc:

Original image + Fusibile post-proc:

tejaskhot commented 5 years ago

Thanks for the quick response. I have two followup questions.

1) I generated outputs using the steps you mentioned with downsized images (same values you mention above) and got an output for family which I think is similar to what you posted. However, a zoomed out view of it shows plenty of surrounding areas being reconstructed as shown. Is this normal/expected?

2) Using the same hyperparameters, I produced outputs for a few of the other scans and they don't look as expected. For eg, here are two views of horse.

Does this mean we have to set hyperparameters for every scan of Tanks and Temples individually? Are the results in your paper produced with such individually picked values or do you use the same set of values across the dataset?

YoYo000 commented 5 years ago

Could you check again the input cameras and other settings? The new camera should have the tight depth range but your reconstruction are with the wide depth range. The zoomed out views of my reconstructions with the provided two commands and parameters look like:

Also, I use the rectified images I sent to you and donot pre-resize images. The test.py script will automatically do the resizing and cropping.

For hyperparameters on DTU evaluation, we use parameters as described in the paper. For Tanks and Temples, we fixed all parameters except the probability threshold (0.6 +- 0.2). This is because some of the scenes contain large portion of background areas and skies. Tuning the probability threshold could effectively control the filtering for these parts.

tejaskhot commented 5 years ago

I tried downloading the files you posted, freshly cloning the repo and running the commands as is but I get all depth value predictions as nan and consequently empty point cloud. Can you please verify the files you linked? There seems to be some issue.

YoYo000 commented 5 years ago

@tejaskhot I see I gave your the wrong link... here is the new cams

Sorry for my mistake!

tejaskhot commented 5 years ago

Thanks! These files work and I am able to reproduce the results. I had one question regarding the full-resolution results. As reported in the paper, I tried using images of size 1920 x 1056 with D=256, interval=1, N=5 on a 16GB GPU but that for me also complains for going out of memory. How are you able to run these inferences at full resolution? Is there something I am missing?

YoYo000 commented 5 years ago

Is the GPU also occupied by other applications (e.g., Chrome) during your experiments? I have encountered the OOM problem when only ~200 MB memory is not available. BTW my experiments were ran on the google cloud ml platform with the P100 GPU.

tejaskhot commented 5 years ago

Thanks!

whubaichuan commented 4 years ago

@tejaskhot Hi, why Yao's results have the tight depth range but your reconstruction is with the wide depth range when roomed out?

Is that because of the range of depth listed in the last lines of cam.txt ?

tejaskhot commented 4 years ago

@whubaichuan I don't remember the specifics to be honest but that seems to be a fair guess. As @YoYo000 pointed out, the cam parameters and depth range are crucial for getting good results.

whubaichuan commented 4 years ago

@tejaskhot Thanks for reply. Have you tested the different settings in T&T Leaderboard? Is the pro_threshold main cause to influence the results in T&T ?

tatsy commented 3 years ago

@YoYo000 I am sorry to annoy you in this busy time, but I'd like to ask you how I can reproduce the results for Tanks and Temples dataset in R-MVSNet paper.

I think the above images and the results in MVSNet paper is made with the camera parameters in short_range_cameras_for_mvsnet that you provided in this repo. However, this short range camera parameters is not provided for advanced dataset (although it's natural because the scene in the advanced sets might not fit in short ranges).

So, I though this means that the results in R-MVSNet paper is made by the camera parameters NOT in short_range_cameras_for_mvsnet, namely stored in cams sub-folder in the folders with scene names such as Auditorium. However, as far as I tested, the reconstruction quality using these camera parameters were significantly lower than those I could see in the R-MVSNet paper.

So, I am wondering if you would shared the tips for tuning the depth range for R-MVSNet paper. Thank you very much for your help.

YoYo000 commented 3 years ago

Hi @tatsy

Yes, you are right, the R-MVSNet paper is using the long range camera for reconstruction, for both the intermediate set and the advanced set. Only MVSNet uses the short_range_cameras_for_mvsnet as it is restricted to a small depth number.

For benchmarking on the advanced dataset, the post-processing would be important. From what I observed the Fusibile point cloud is quite noisy, and I was using the fusion + refinement strategy described in the paper to get the benchmarking result.

tatsy commented 3 years ago

Hi @YoYo000

Thank you very much for your reply. So, I guess the problem is variational depth refinement in the R-MVSNet paper because the results I got were rather sparse as well as they are noisy. Actually, the fusibile works pretty well for DTU dataset with MVSNet (not R-MVSNet), and moreover, the behavior of the R-MVSNet is quite similar to those in the second row of Table 3 (R-MVSNet paper, for refinement ablated one).

I have already implemented the program for the variational depth refinement but it is quite unstable in the gradient descent process. As I posted in another issue, I am wondering how each of ZNCC and bilateral smoothing terms is. https://github.com/YoYo000/MVSNet/issues/35#issuecomment-721152112

Concretely, my questions are:

ZNCC is used as the data term as it is? It is not used like exp(-ZNCC) or others?
Weights for the data term (by ZNCC) and bilateral smoothing term are both 1? Or, they are different and, e.g., the weight for the data term is 1 and that for the smoothing term is 0.01 or something like that?
Neighbors for the bilateral smoothing term (N(p_1) in the paper) is 4-neighbor pixels, 8-neighbor pixels, or larger?

Thank you very much for your help.

StephenCurryfan commented 11 months ago

https://drive.google.com/open?id=1YjheSUSd5dDVjFyu2Yripw7-uMcOGTxv网址失效了

YoYo000 / MVSNet

Tanks and Temples Setup #14