TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

Better metrics with worse visual results? #197

Closed jiaqixuac closed 2 years ago

jiaqixuac commented 2 years ago

Hi,

Thanks for the great work!

I tried to do eval.py and save the predicted results. Now for DDAD, I'm able to get the reported metrics for the pre-trained models, PackNet01_MR_selfsup_D.ckpt and PackNetSAN01_HR_sup_D.ckpt.

But when I check the predicted depths, I find that better metrics do not guarantee better visual results. E.g., for 000150/15616458249936530.png. 15616458249936530

PackNetSAN01_HR_sup_D.ckpt with completion (Abs.Rel. 0.038 for validation). 15616458249936530

PackNetSAN01_HR_sup_D.ckpt with prediction (Abs.Rel. 0.086 for validation). 15616458249936530

PackNet01_MR_selfsup_D.ckpt (Abs.Rel. 0.173 for validation). 15616458249936530

It seems that PackNet-SAN with depth completion produces some stripe effects, and PackNet-SAN with depth prediction produces incorrect prediction at the top region. This may prohibit good scene reconstruction.

On the other hand, PackNet can produce more visually natural results (though not accurate and sharp in local regions).

Am I wrong at some point when getting the depth prediction?

chapa17 commented 2 years ago

Hello @jiaqixuac

I am sorry to trouble you but I have been getting an error while compiling the eval.py script. Can you please help me figure out what's going on wrong? I have attached the screenshot of the error.

Screenshot from 2021-12-03 10-06-52

PS: I assume I need not install DGP since it is already there in the docker file

jiaqixuac commented 2 years ago

Hello @chapa17, the issue arises because this codebase is not compatible with the current DGP. You can refer to #192 .

I didn't use docker and set up the conda environment following the Dockfile.

Besides, I still met some errors with the compatible DGP. Sorry that I can't remember exactly. Just run the code and fix it.

jiaqixuac commented 2 years ago

Btw, for self-supervised PackNet, i.e., PackNet01_MR_selfsup_D.ckpt, I find that the validation performance is much better than the training performance, Abs.Rel. 0.173 for the validation split v.s. Abs.Rel. 0.439 for the training split.

Is this normal for the self-supervised setting?

split: ['train']

|*********************************************************************************************|
|     METRIC     | abs_rel  | sqr_rel  |   rmse   | rmse_log |    a1    |    a2    |    a3    |
|*********************************************************************************************|
| *** ./data/DDAD/ddad_train_val/ddad.json/train (camera_01)                                  |
|*********************************************************************************************|
| DEPTH          |  0.872   |  22.444  |  34.858  |  2.166   |  0.007   |  0.015   |  0.022   |
| DEPTH_PP       |  0.872   |  22.439  |  34.867  |  2.167   |  0.007   |  0.014   |  0.022   |
| DEPTH_GT       |  0.441   |  35.048  |  19.660  |  0.396   |  0.733   |  0.831   |  0.893   |
| DEPTH_PP_GT    |  0.439   |  35.010  |  19.447  |  0.394   |  0.736   |  0.832   |  0.894   |
|*********************************************************************************************|

split: ['val']

|*********************************************************************************************|
|     METRIC     | abs_rel  | sqr_rel  |   rmse   | rmse_log |    a1    |    a2    |    a3    |
|*********************************************************************************************|
| *** ./data/DDAD/ddad_train_val/ddad.json/val (camera_01)                                    |
|*********************************************************************************************|
| DEPTH          |  0.886   |  25.101  |  38.643  |  2.228   |  0.001   |  0.003   |  0.005   |
| DEPTH_PP       |  0.886   |  25.110  |  38.657  |  2.228   |  0.001   |  0.003   |  0.005   |
| DEPTH_GT       |  0.178   |  7.529   |  14.616  |  0.254   |  0.831   |  0.928   |  0.963   |
| DEPTH_PP_GT    |  0.173   |  7.164   |  14.363  |  0.249   |  0.835   |  0.930   |  0.964   |
|*********************************************************************************************|
livey commented 2 years ago

When I trained from scratch, sometimes I also got undesired large predictions at the top. I guess that because the sky always appears at the top, and they are very similar. So, no matter what you predict, you will get the similar image loss.

jiaqixuac commented 2 years ago

Hi @livey , thanks for your discussion. Yes, there is no ground truth supervision at the top region that may lead to these undesired results.

I find that this is a common issue and many other algorithms trained on other Lidar datasets have this effect. E.g., DPT for general purpose depth estimation has appealing results for this kind of autonomous driving view (not quantitatively so accurate). But when the model is finetuned with the KITTI dataset, they show similar unsatisfactory visual results at the top region.

VitorGuizilini-TRI commented 2 years ago

That's correct, the sky is problematic both for supervised (no GT information) and self-supervised (no texture, very large distance) training. In one of our latest works we show how synthetic data can be used to generate a surface normal regularization loss, that mitigates this problem.