Closed AICVHub closed 5 years ago
It can happen, you must look at the overall trend in a longer interval. If you complete training you will see better results
It can happen, you must look at the overall trend in a longer interval. If you complete training you will see better results
I have trained 40000 epochs, but it still misconvergence . I don't konw what's wrong with the code. It's raw code from the repositories. Didn't any other same quesitions? Looking forward to your reply.
Did you use the Voc dataset? Why is the number of iterations 1000? (It is normally 4000)
Yes, I used Voc2012. iterations 1000 is just one of trys. Even I train it for 40000 iterations , the model is still misconvergence. Total loss is around 1.0, min to 0.2,max to 2 nearly.
------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月20日(星期五) 下午4:59 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] train loss misconvergence (#106)
Did you use the Voc dataset? Why is the number of iterations 1000? (It is normally 4000)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
You should use voc2007, that's the dataset on which this network is based by default
You should use voc2007, that's the dataset on which this network is based by default
I have tried VOC2007,but this model is still misconvergence. The loss curve as follow(lr = 0.001, I didn't change anything about the code):
--
Then it does converge, it slowly goes down
Actually, it still missconvergence after 100,000 iters! I used the raw code without any change, but it still not work~ So strange, isn't it?
------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月23日(星期一) 下午2:17 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] train loss misconvergence (#106)
Then it does converge, it slowly goes down
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Send the graph
the loss curve is:
network architecture: {'Variable': [], 'vgg_16/bbox_pred/biases': [84], 'vgg_16/bbox_pred/biases/Momentum': [84], 'vgg_16/bbox_pred/weights': [4096, 84], 'vgg_16/bbox_pred/weights/Momentum': [4096, 84], 'vgg_16/cls_score/biases': [21], 'vgg_16/cls_score/biases/Momentum': [21], 'vgg_16/cls_score/weights': [4096, 21], 'vgg_16/cls_score/weights/Momentum': [4096, 21], 'vgg_16/conv1/conv1_1/biases': [64], 'vgg_16/conv1/conv1_1/weights': [3, 3, 3, 64], 'vgg_16/conv1/conv1_2/biases': [64], 'vgg_16/conv1/conv1_2/weights': [3, 3, 64, 64], 'vgg_16/conv2/conv2_1/biases': [128], 'vgg_16/conv2/conv2_1/weights': [3, 3, 64, 128], 'vgg_16/conv2/conv2_2/biases': [128], 'vgg_16/conv2/conv2_2/weights': [3, 3, 128, 128], 'vgg_16/conv3/conv3_1/biases': [256], 'vgg_16/conv3/conv3_1/biases/Momentum': [256], 'vgg_16/conv3/conv3_1/weights': [3, 3, 128, 256], 'vgg_16/conv3/conv3_1/weights/Momentum': [3, 3, 128, 256], 'vgg_16/conv3/conv3_2/biases': [256], 'vgg_16/conv3/conv3_2/biases/Momentum': [256], 'vgg_16/conv3/conv3_2/weights': [3, 3, 256, 256], 'vgg_16/conv3/conv3_2/weights/Momentum': [3, 3, 256, 256], 'vgg_16/conv3/conv3_3/biases': [256], 'vgg_16/conv3/conv3_3/biases/Momentum': [256], 'vgg_16/conv3/conv3_3/weights': [3, 3, 256, 256], 'vgg_16/conv3/conv3_3/weights/Momentum': [3, 3, 256, 256], 'vgg_16/conv4/conv4_1/biases': [512], 'vgg_16/conv4/conv4_1/biases/Momentum': [512], 'vgg_16/conv4/conv4_1/weights': [3, 3, 256, 512], 'vgg_16/conv4/conv4_1/weights/Momentum': [3, 3, 256, 512], 'vgg_16/conv4/conv4_2/biases': [512], 'vgg_16/conv4/conv4_2/biases/Momentum': [512], 'vgg_16/conv4/conv4_2/weights': [3, 3, 512, 512], 'vgg_16/conv4/conv4_2/weights/Momentum': [3, 3, 512, 512], 'vgg_16/conv4/conv4_3/biases': [512], 'vgg_16/conv4/conv4_3/biases/Momentum': [512], 'vgg_16/conv4/conv4_3/weights': [3, 3, 512, 512], 'vgg_16/conv4/conv4_3/weights/Momentum': [3, 3, 512, 512], 'vgg_16/conv5/conv5_1/biases': [512], 'vgg_16/conv5/conv5_1/biases/Momentum': [512], 'vgg_16/conv5/conv5_1/weights': [3, 3, 512, 512], 'vgg_16/conv5/conv5_1/weights/Momentum': [3, 3, 512, 512], 'vgg_16/conv5/conv5_2/biases': [512], 'vgg_16/conv5/conv5_2/biases/Momentum': [512], 'vgg_16/conv5/conv5_2/weights': [3, 3, 512, 512], 'vgg_16/conv5/conv5_2/weights/Momentum': [3, 3, 512, 512], 'vgg_16/conv5/conv5_3/biases': [512], 'vgg_16/conv5/conv5_3/biases/Momentum': [512], 'vgg_16/conv5/conv5_3/weights': [3, 3, 512, 512], 'vgg_16/conv5/conv5_3/weights/Momentum': [3, 3, 512, 512], 'vgg_16/fc6/biases': [4096], 'vgg_16/fc6/biases/Momentum': [4096], 'vgg_16/fc6/weights': [25088, 4096], 'vgg_16/fc6/weights/Momentum': [25088, 4096], 'vgg_16/fc7/biases': [4096], 'vgg_16/fc7/biases/Momentum': [4096], 'vgg_16/fc7/weights': [4096, 4096], 'vgg_16/fc7/weights/Momentum': [4096, 4096], 'vgg_16/rpn_bbox_pred/biases': [36], 'vgg_16/rpn_bbox_pred/biases/Momentum': [36], 'vgg_16/rpn_bbox_pred/weights': [1, 1, 512, 36], 'vgg_16/rpn_bbox_pred/weights/Momentum': [1, 1, 512, 36], 'vgg_16/rpn_cls_score/biases': [18], 'vgg_16/rpn_cls_score/biases/Momentum': [18], 'vgg_16/rpn_cls_score/weights': [1, 1, 512, 18], 'vgg_16/rpn_cls_score/weights/Momentum': [1, 1, 512, 18], 'vgg_16/rpn_conv/3x3/biases': [512], 'vgg_16/rpn_conv/3x3/biases/Momentum': [512], 'vgg_16/rpn_conv/3x3/weights': [3, 3, 512, 512], 'vgg_16/rpn_conv/3x3/weights/Momentum': [3, 3, 512, 512]}
It does converge to a value (about 1).
(Those isolated peeks are normal during training)
Yes, isolated peeks may normal.
However, the trained result is not work in test. I run the demo.py, but model can't detected anything.
------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月23日(星期一) 晚上6:20 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] train loss misconvergence (#106)
It does converge to a value (about 1).
(Those isolated peeks are normal during training)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
What is the output on terminal of demo.py
?
demo.py is the test file in the raw repo. It should plot the detect results, but it got nothing after runing.
------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月23日(星期一) 晚上6:31 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] train loss misconvergence (#106)
What is the output on terminal of demo.py?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Yes, but what do you see on the terminal when you run python demo.py
?
Results of the demo.py are following as (test images can be feed to the model, but no detected boxes):
------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月23日(星期一) 晚上7:06 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] Demo.py does not detect anything (#106)
Yes, but what do you see on the terminal when you run python demo.py?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
In training, my loss is always misconvergence like this: speed: 0.447s / iter iter: 510 / 1000, total loss: 0.971236, mean loss: 1.304287 rpn_loss_cls: 0.113811 rpn_loss_box: 0.273803 loss_cls: 0.326126 loss_box: 0.257496
speed: 0.445s / iter iter: 520 / 1000, total loss: 0.891596, mean loss: 1.306895 rpn_loss_cls: 0.138590 rpn_loss_box: 0.122660 loss_cls: 0.354118 loss_box: 0.276228
speed: 0.442s / iter iter: 530 / 1000, total loss: 2.323336, mean loss: 1.538633 rpn_loss_cls: 0.595005 rpn_loss_box: 0.116837 loss_cls: 0.872962 loss_box: 0.738532
speed: 0.438s / iter iter: 540 / 1000, total loss: 1.124045, mean loss: 1.430413 rpn_loss_cls: 0.270220 rpn_loss_box: 0.024262 loss_cls: 0.482806 loss_box: 0.346756
speed: 0.441s / iter iter: 550 / 1000, total loss: 1.567395, mean loss: 1.338224 rpn_loss_cls: 0.185243 rpn_loss_box: 0.066832 loss_cls: 0.488159 loss_box: 0.827162
speed: 0.439s / iter iter: 560 / 1000, total loss: 2.352248, mean loss: 1.536727 rpn_loss_cls: 0.350180 rpn_loss_box: 0.109354 loss_cls: 0.818716 loss_box: 1.073998
speed: 0.436s / iter iter: 570 / 1000, total loss: 1.296960, mean loss: 1.338647 rpn_loss_cls: 0.107608 rpn_loss_box: 0.212813 loss_cls: 0.604701 loss_box: 0.371838
speed: 0.436s / iter iter: 580 / 1000, total loss: 0.841824, mean loss: 0.996633 rpn_loss_cls: 0.077819 rpn_loss_box: 0.047122 loss_cls: 0.445076 loss_box: 0.271808
speed: 0.433s / iter iter: 590 / 1000, total loss: 1.592798, mean loss: 1.508881 rpn_loss_cls: 0.429948 rpn_loss_box: 0.219355 loss_cls: 0.422750 loss_box: 0.520745
envs: win10 + tf1.14.0 + python3.6 + CUDA10