Cannot reproduce results - retraining results in overconfident model

alexrich021 commented 2 years ago

Hi there,

Thanks for your great work!

I've used your code to re-train MVSTER on 640x512 images/depths and test on DTU 1152x864 images. I'm using a slightly different system - instead of 4 GPUs with batch size 2 on each GPU I am using 2 GPUs with batch size 4 on each. I've tried:

using the defaults you've set in the code (i.e., using a mono depth weight of 0 in the loss)
using the hyperparams you indicate in the paper (i.e., setting mono depth loss weight to 0.0003)
changing various parameters not indicated in the paper like warmup of the LR for transformer convergence and # of OT iterations

I've trained several models for each condition. So far, no model achieves the 0.313 overall result on DTU test set your pretrained model achieves. The best retrained model so far gets 0.325 on the test set.

One thing I've noticed is every re-trained model is highly overconfident in it's depth predictions. When testing using your pretrained model, the probability mask (with prob threshold 0.5) for a random image looks like this: 00000000_photo

For the same image (and every other image I've inspected thus far), the probability mask of models I train looks like this: 00000000_photo

Upon inspecting the output confidence map (i.e. test_mvs4.py: line 262), my re-trained models output >0.99 confidence for seemingly >99% of pixels of all confidence maps. I haven't actually measured this quantitatively, just inspected a large number of confidence maps. This results in less points being filtered, which in turn appears to result in a worse accuracy than the pretrained model (~0.400 for all my models vs. 0.350 for yours). Furthermore, these overconfident predictions occur even when testing a model only trained for a single epoch.

Any idea why this might be occurring? Was the provided pretrained model trained using this exact code or a previous version? Do you think the change from 4 GPU batch 2 to 2 GPU batch 4 would have that large a training effect? Any help would be much appreciated

JeffWang987 commented 2 years ago

It is strange that using bs=4@2GPU produces worse results than bs=2@4GPU. Here are some suggestions to deal with the problem: (1) Use sync_bn (torch.nn.SyncBatchNorm.convert_sync_batchnorm). (2) Carefully tune the LR, our training strategy may not work for bs=4@2GPU. (3) Check the torch/cuda version.

And for the highly overconfidence issue, I will investigate later (Currently I'm preparing other DDLs).

ThanTT commented 1 year ago

@alexrich021 Hi, i want to know have you solved the highly overconfidence problem? I meet a similar situation with you.

ToscanaGoGithub commented 1 year ago

@ThanTT @alexrich021 Hi, i want to know have you solved the highly overconfidence problem? I meet a similar situation with you.

JeffWang987 / MVSTER

Cannot reproduce results - retraining results in overconfident model #12