TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

reproduction of the monodepth2 with resnet18 #40

Closed MingYang-buaa closed 4 years ago

MingYang-buaa commented 4 years ago

Hey! Thanks for your wonderful work!

Now i'm trying to reproduce the result of monodepth2 with resnet18 backbone. I used the train_kitti.yaml to train about 50 epochs. But the result is not good with loss around about 0.075 but absrel is only 0.125. I think it is may overfitting with the imperfect hyperparameters. So can you share the .yaml file for training monodepth2?

MingYang-buaa commented 4 years ago

And the infer result is also not good as that. test

VitorGuizilini-TRI commented 4 years ago

We routinely get < 0.12 abs_rel with monodepth2, just by changing from PackNet to DepthResNet. Some possible reasons:

Let me know how it goes.

MingYang-buaa commented 4 years ago
  1. yes, i used the ImageNet pretrained weights for both DepthResNet and PoseResNet.
  2. the learing rate is 2e-4 for both depth net and pose net. Then something i changed is step_size(from 30 to 15), batch_size(from 4 to 12). And i'm training on 4 GPUs(2080ti) rather than 8GPUS(V100). that's the problem?
MingYang-buaa commented 4 years ago

By the way. Did you try to train monodepth2 with VelSupModel? If true, how about the result?

VitorGuizilini-TRI commented 4 years ago

Try bringing your learning rate down to 1e-4, just to make sure, and I will start a training session here with your configuration as soon as possible, again just to make sure. We've tried monodepth2 with VelSupModel, usually we achieve metrically accurate results similar to SelfSupModel.

MingYang-buaa commented 4 years ago

Thanks for your so quick reply! I also will try to train the SelfSupModel and SelfSupModel with the lr as 1e-4. Looking forward to your good news.

VitorGuizilini-TRI commented 4 years ago

Hi, results seem to be OK here (attached image from my latest run with resnet18 backbone), so I am not sure what could be the problem from your end. Regardless, it would be interesting to check if our losses match exactly those from monodepth2, as we are using some of their tricks at training time.

val-KITTI_raw-eigen_test_files-velodyne-544-inv_depth_49_afb5bd80

MingYang-buaa commented 4 years ago

All right. It's confusing that i also test to train with lr in 1e-4 but the abs_rel only reach to 0.120. By the way, i'm traing in no-docker environment, is it for this reason? Can you shared the yaml file? I would appreciate it. Also the tricks you mentioned above is that the matching resolution and camera intrinsic params? I only noticed these :)

the depth map seems to be ok. test

VitorGuizilini-TRI commented 4 years ago

I will share the .yaml file with you shortly, but I haven't changed much from the repository one. About the tricks, there is also auto-masking and minimum reprojection loss. Was your 0.120 the minimum value or the end value? I noticed that monodepth2 tends to degrade a little if you train for too long. Training in a no-docker environment should be fine, but perhaps there are some differences in library versions that are making an impact.

MingYang-buaa commented 4 years ago

yeap. I had trained about 30 epochs, but about the last 10 epochs of asb_rel values float around 0.120. So i stoped it. And i made a mistake that the pic I posted for the first time is trained with VelSupModel. But this is also strange about that. Have you encountered this wrong result? Maybe I have to train for a few more rounds to find out why.

MingYang-buaa commented 4 years ago

@VitorGuizilini-TRI Hi. Can you share the monodepth2.yaml now? Other than that, the depth regularization weight you mentioned in paper is included in the code now? I cannot find it. And the velocity-scaling weight is little different that 0.05 in paper but 0.1 in default_config. Does it meant that 0.1 is better than 0.05 in experirent?

pjckoch commented 3 years ago

Hi @MingYang-buaa ,

could you give any insights on whether you were able to resolve this issue? I encountered the same problem with the VelSupModel. For me the solution was to first train self-supervised and introduce the velocity supervision after a few epochs. If training with velocity supervision from the beginning, the output would look exactly like yours. I observed that the error lies in a wrongly estimated pose estimation: when looking at the warped reference images, the one that is generated from t-1 looks ok, but the one from t+1 does not. In fact, the one from t+1 seems more like it was generating a warped t+2 frame, instead of a warped t frame. That implies that the pose transformation is always estimated to be forward, never backwards.

Please let me know if you have any further insights on that.

Thank you. Best regards, Patrick