YvanYin / VNL_Monocular_Depth_Prediction

Monocular Depth Prediction
Other
461 stars 79 forks source link

About network training #49

Closed YiLiM1 closed 3 years ago

YiLiM1 commented 3 years ago

Hello, when I use the training method in your article: training the network with NYUD and KITTI, the loss does not converge. Have you trained on nyud or Kitti alone.

YvanYin commented 3 years ago

Hi, I have trained the model on KITTI and NYU alone, but we didn't face this problem.

wuzht commented 3 years ago

Same here, the network does not converge.

YvanYin commented 3 years ago

Could you show your loss and learning rate here?

wuzht commented 3 years ago
[Step 73530/86850] [Epoch 25/30]  [kitti]
                loss: 9.829,    time: 1.526856,    eta: 5:38:57
                metric_loss: 2.618,             virtual_normal_loss: 7.343,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000093,       group1_lr: 0.000093,
[Step 73540/86850] [Epoch 25/30]  [kitti]
                loss: 9.734,    time: 1.526918,    eta: 5:38:43
                metric_loss: 2.651,             virtual_normal_loss: 7.094,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000093,       group1_lr: 0.000093,
[Step 73550/86850] [Epoch 25/30]  [kitti]
                loss: 9.716,    time: 1.526981,    eta: 5:38:28
                metric_loss: 2.611,             virtual_normal_loss: 7.137,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000093,       group1_lr: 0.000093,
[Step 73560/86850] [Epoch 25/30]  [kitti]
                loss: 9.999,    time: 1.526977,    eta: 5:38:13
                metric_loss: 2.613,             virtual_normal_loss: 7.250,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73570/86850] [Epoch 25/30]  [kitti]
                loss: 10.003,    time: 1.526970,    eta: 5:37:58
                metric_loss: 2.659,             virtual_normal_loss: 7.324,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73580/86850] [Epoch 25/30]  [kitti]
                loss: 9.877,    time: 1.527021,    eta: 5:37:43
                metric_loss: 2.666,             virtual_normal_loss: 7.326,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73590/86850] [Epoch 25/30]  [kitti]
                loss: 9.916,    time: 1.527081,    eta: 5:37:29
                metric_loss: 2.626,             virtual_normal_loss: 7.350,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73600/86850] [Epoch 25/30]  [kitti]
                loss: 9.988,    time: 1.527141,    eta: 5:37:14
                metric_loss: 2.641,             virtual_normal_loss: 7.360,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73610/86850] [Epoch 25/30]  [kitti]
                loss: 10.206,    time: 1.527199,    eta: 5:37:00
                metric_loss: 2.674,             virtual_normal_loss: 7.393,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73620/86850] [Epoch 25/30]  [kitti]
                loss: 9.851,    time: 1.527234,    eta: 5:36:45
                metric_loss: 2.592,             virtual_normal_loss: 7.259,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73630/86850] [Epoch 25/30]  [kitti]
                loss: 9.606,    time: 1.527297,    eta: 5:36:30
                metric_loss: 2.572,             virtual_normal_loss: 7.096,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73640/86850] [Epoch 25/30]  [kitti]
                loss: 9.606,    time: 1.527356,    eta: 5:36:16
                metric_loss: 2.516,             virtual_normal_loss: 7.096,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73650/86850] [Epoch 25/30]  [kitti]
                loss: 9.705,    time: 1.527416,    eta: 5:36:01
                metric_loss: 2.519,             virtual_normal_loss: 7.210,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73660/86850] [Epoch 25/30]  [kitti]
                loss: 9.985,    time: 1.527482,    eta: 5:35:47
                metric_loss: 2.622,             virtual_normal_loss: 7.357,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73670/86850] [Epoch 25/30]  [kitti]
                loss: 9.811,    time: 1.527546,    eta: 5:35:33
                metric_loss: 2.641,             virtual_normal_loss: 7.216,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73680/86850] [Epoch 25/30]  [kitti]
                loss: 9.615,    time: 1.527540,    eta: 5:35:17
                metric_loss: 2.521,             virtual_normal_loss: 7.118,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73690/86850] [Epoch 25/30]  [kitti]
                loss: 9.613,    time: 1.527537,    eta: 5:35:02
                metric_loss: 2.503,             virtual_normal_loss: 7.071,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73700/86850] [Epoch 25/30]  [kitti]
                loss: 9.863,    time: 1.527586,    eta: 5:34:47
                metric_loss: 2.548,             virtual_normal_loss: 7.352,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73710/86850] [Epoch 25/30]  [kitti]
                loss: 9.806,    time: 1.527648,    eta: 5:34:33
                metric_loss: 2.616,             virtual_normal_loss: 7.270,             abs_rel: 0.823165,       silog: 0.586482, 
                group0_lr: 0.000092,       group1_lr: 0.000092, 

The validation error does not decrease during training. How can I fix this? Thanks.

wuzht commented 3 years ago

Note that I did not alter any training settings

wuzht commented 3 years ago

Problem solved. I have to generate the dense depth maps from the sparse ones before training.