Closed YiLiM1 closed 3 years ago
Hi, I have trained the model on KITTI and NYU alone, but we didn't face this problem.
Same here, the network does not converge.
Could you show your loss and learning rate here?
[Step 73530/86850] [Epoch 25/30] [kitti]
loss: 9.829, time: 1.526856, eta: 5:38:57
metric_loss: 2.618, virtual_normal_loss: 7.343, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000093, group1_lr: 0.000093,
[Step 73540/86850] [Epoch 25/30] [kitti]
loss: 9.734, time: 1.526918, eta: 5:38:43
metric_loss: 2.651, virtual_normal_loss: 7.094, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000093, group1_lr: 0.000093,
[Step 73550/86850] [Epoch 25/30] [kitti]
loss: 9.716, time: 1.526981, eta: 5:38:28
metric_loss: 2.611, virtual_normal_loss: 7.137, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000093, group1_lr: 0.000093,
[Step 73560/86850] [Epoch 25/30] [kitti]
loss: 9.999, time: 1.526977, eta: 5:38:13
metric_loss: 2.613, virtual_normal_loss: 7.250, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73570/86850] [Epoch 25/30] [kitti]
loss: 10.003, time: 1.526970, eta: 5:37:58
metric_loss: 2.659, virtual_normal_loss: 7.324, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73580/86850] [Epoch 25/30] [kitti]
loss: 9.877, time: 1.527021, eta: 5:37:43
metric_loss: 2.666, virtual_normal_loss: 7.326, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73590/86850] [Epoch 25/30] [kitti]
loss: 9.916, time: 1.527081, eta: 5:37:29
metric_loss: 2.626, virtual_normal_loss: 7.350, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73600/86850] [Epoch 25/30] [kitti]
loss: 9.988, time: 1.527141, eta: 5:37:14
metric_loss: 2.641, virtual_normal_loss: 7.360, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73610/86850] [Epoch 25/30] [kitti]
loss: 10.206, time: 1.527199, eta: 5:37:00
metric_loss: 2.674, virtual_normal_loss: 7.393, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73620/86850] [Epoch 25/30] [kitti]
loss: 9.851, time: 1.527234, eta: 5:36:45
metric_loss: 2.592, virtual_normal_loss: 7.259, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73630/86850] [Epoch 25/30] [kitti]
loss: 9.606, time: 1.527297, eta: 5:36:30
metric_loss: 2.572, virtual_normal_loss: 7.096, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73640/86850] [Epoch 25/30] [kitti]
loss: 9.606, time: 1.527356, eta: 5:36:16
metric_loss: 2.516, virtual_normal_loss: 7.096, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73650/86850] [Epoch 25/30] [kitti]
loss: 9.705, time: 1.527416, eta: 5:36:01
metric_loss: 2.519, virtual_normal_loss: 7.210, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73660/86850] [Epoch 25/30] [kitti]
loss: 9.985, time: 1.527482, eta: 5:35:47
metric_loss: 2.622, virtual_normal_loss: 7.357, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73670/86850] [Epoch 25/30] [kitti]
loss: 9.811, time: 1.527546, eta: 5:35:33
metric_loss: 2.641, virtual_normal_loss: 7.216, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73680/86850] [Epoch 25/30] [kitti]
loss: 9.615, time: 1.527540, eta: 5:35:17
metric_loss: 2.521, virtual_normal_loss: 7.118, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73690/86850] [Epoch 25/30] [kitti]
loss: 9.613, time: 1.527537, eta: 5:35:02
metric_loss: 2.503, virtual_normal_loss: 7.071, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73700/86850] [Epoch 25/30] [kitti]
loss: 9.863, time: 1.527586, eta: 5:34:47
metric_loss: 2.548, virtual_normal_loss: 7.352, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73710/86850] [Epoch 25/30] [kitti]
loss: 9.806, time: 1.527648, eta: 5:34:33
metric_loss: 2.616, virtual_normal_loss: 7.270, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
The validation error does not decrease during training. How can I fix this? Thanks.
Note that I did not alter any training settings
Problem solved. I have to generate the dense depth maps from the sparse ones before training.
Hello, when I use the training method in your article: training the network with NYUD and KITTI, the loss does not converge. Have you trained on nyud or Kitti alone.