Demo.py does not detect anything

AICVHub commented 5 years ago

In training, my loss is always misconvergence like this: speed: 0.447s / iter iter: 510 / 1000, total loss: 0.971236, mean loss: 1.304287 rpn_loss_cls: 0.113811 rpn_loss_box: 0.273803 loss_cls: 0.326126 loss_box: 0.257496

speed: 0.445s / iter iter: 520 / 1000, total loss: 0.891596, mean loss: 1.306895 rpn_loss_cls: 0.138590 rpn_loss_box: 0.122660 loss_cls: 0.354118 loss_box: 0.276228

speed: 0.442s / iter iter: 530 / 1000, total loss: 2.323336, mean loss: 1.538633 rpn_loss_cls: 0.595005 rpn_loss_box: 0.116837 loss_cls: 0.872962 loss_box: 0.738532

speed: 0.438s / iter iter: 540 / 1000, total loss: 1.124045, mean loss: 1.430413 rpn_loss_cls: 0.270220 rpn_loss_box: 0.024262 loss_cls: 0.482806 loss_box: 0.346756

speed: 0.441s / iter iter: 550 / 1000, total loss: 1.567395, mean loss: 1.338224 rpn_loss_cls: 0.185243 rpn_loss_box: 0.066832 loss_cls: 0.488159 loss_box: 0.827162

speed: 0.439s / iter iter: 560 / 1000, total loss: 2.352248, mean loss: 1.536727 rpn_loss_cls: 0.350180 rpn_loss_box: 0.109354 loss_cls: 0.818716 loss_box: 1.073998

speed: 0.436s / iter iter: 570 / 1000, total loss: 1.296960, mean loss: 1.338647 rpn_loss_cls: 0.107608 rpn_loss_box: 0.212813 loss_cls: 0.604701 loss_box: 0.371838

speed: 0.436s / iter iter: 580 / 1000, total loss: 0.841824, mean loss: 0.996633 rpn_loss_cls: 0.077819 rpn_loss_box: 0.047122 loss_cls: 0.445076 loss_box: 0.271808

speed: 0.433s / iter iter: 590 / 1000, total loss: 1.592798, mean loss: 1.508881 rpn_loss_cls: 0.429948 rpn_loss_box: 0.219355 loss_cls: 0.422750 loss_box: 0.520745

envs: win10 + tf1.14.0 + python3.6 + CUDA10

morpheusthewhite commented 5 years ago

It can happen, you must look at the overall trend in a longer interval. If you complete training you will see better results

AICVHub commented 5 years ago

It can happen, you must look at the overall trend in a longer interval. If you complete training you will see better results

I have trained 40000 epochs, but it still misconvergence . I don't konw what's wrong with the code. It's raw code from the repositories. Didn't any other same quesitions? Looking forward to your reply.

morpheusthewhite commented 5 years ago

Did you use the Voc dataset? Why is the number of iterations 1000? (It is normally 4000)

AICVHub commented 5 years ago

Yes, I used Voc2012. iterations 1000 is just one of trys. Even I train it for 40000 iterations , the model is still misconvergence. Total loss is around 1.0, min to 0.2,max to 2 nearly.

------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月20日(星期五) 下午4:59 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] train loss misconvergence (#106)

Did you use the Voc dataset? Why is the number of iterations 1000? (It is normally 4000)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

morpheusthewhite commented 5 years ago

You should use voc2007, that's the dataset on which this network is based by default

AICVHub commented 5 years ago

You should use voc2007, that's the dataset on which this network is based by default

I have tried VOC2007，but this model is still misconvergence. The loss curve as follow（lr = 0.001, I didn't change anything about the code）:

--

morpheusthewhite commented 5 years ago

Then it does converge, it slowly goes down

AICVHub commented 5 years ago

Actually, it still missconvergence after 100,000 iters! I used the raw code without any change, but it still not work~ So strange, isn't it?

------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月23日(星期一) 下午2:17 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] train loss misconvergence (#106)

Then it does converge, it slowly goes down

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

morpheusthewhite commented 5 years ago

Send the graph

AICVHub commented 5 years ago

the loss curve is: total_loss

network architecture: {'Variable': [], 'vgg_16/bbox_pred/biases': [84], 'vgg_16/bbox_pred/biases/Momentum': [84], 'vgg_16/bbox_pred/weights': [4096, 84], 'vgg_16/bbox_pred/weights/Momentum': [4096, 84], 'vgg_16/cls_score/biases': [21], 'vgg_16/cls_score/biases/Momentum': [21], 'vgg_16/cls_score/weights': [4096, 21], 'vgg_16/cls_score/weights/Momentum': [4096, 21], 'vgg_16/conv1/conv1_1/biases': [64], 'vgg_16/conv1/conv1_1/weights': [3, 3, 3, 64], 'vgg_16/conv1/conv1_2/biases': [64], 'vgg_16/conv1/conv1_2/weights': [3, 3, 64, 64], 'vgg_16/conv2/conv2_1/biases': [128], 'vgg_16/conv2/conv2_1/weights': [3, 3, 64, 128], 'vgg_16/conv2/conv2_2/biases': [128], 'vgg_16/conv2/conv2_2/weights': [3, 3, 128, 128], 'vgg_16/conv3/conv3_1/biases': [256], 'vgg_16/conv3/conv3_1/biases/Momentum': [256], 'vgg_16/conv3/conv3_1/weights': [3, 3, 128, 256], 'vgg_16/conv3/conv3_1/weights/Momentum': [3, 3, 128, 256], 'vgg_16/conv3/conv3_2/biases': [256], 'vgg_16/conv3/conv3_2/biases/Momentum': [256], 'vgg_16/conv3/conv3_2/weights': [3, 3, 256, 256], 'vgg_16/conv3/conv3_2/weights/Momentum': [3, 3, 256, 256], 'vgg_16/conv3/conv3_3/biases': [256], 'vgg_16/conv3/conv3_3/biases/Momentum': [256], 'vgg_16/conv3/conv3_3/weights': [3, 3, 256, 256], 'vgg_16/conv3/conv3_3/weights/Momentum': [3, 3, 256, 256], 'vgg_16/conv4/conv4_1/biases': [512], 'vgg_16/conv4/conv4_1/biases/Momentum': [512], 'vgg_16/conv4/conv4_1/weights': [3, 3, 256, 512], 'vgg_16/conv4/conv4_1/weights/Momentum': [3, 3, 256, 512], 'vgg_16/conv4/conv4_2/biases': [512], 'vgg_16/conv4/conv4_2/biases/Momentum': [512], 'vgg_16/conv4/conv4_2/weights': [3, 3, 512, 512], 'vgg_16/conv4/conv4_2/weights/Momentum': [3, 3, 512, 512], 'vgg_16/conv4/conv4_3/biases': [512], 'vgg_16/conv4/conv4_3/biases/Momentum': [512], 'vgg_16/conv4/conv4_3/weights': [3, 3, 512, 512], 'vgg_16/conv4/conv4_3/weights/Momentum': [3, 3, 512, 512], 'vgg_16/conv5/conv5_1/biases': [512], 'vgg_16/conv5/conv5_1/biases/Momentum': [512], 'vgg_16/conv5/conv5_1/weights': [3, 3, 512, 512], 'vgg_16/conv5/conv5_1/weights/Momentum': [3, 3, 512, 512], 'vgg_16/conv5/conv5_2/biases': [512], 'vgg_16/conv5/conv5_2/biases/Momentum': [512], 'vgg_16/conv5/conv5_2/weights': [3, 3, 512, 512], 'vgg_16/conv5/conv5_2/weights/Momentum': [3, 3, 512, 512], 'vgg_16/conv5/conv5_3/biases': [512], 'vgg_16/conv5/conv5_3/biases/Momentum': [512], 'vgg_16/conv5/conv5_3/weights': [3, 3, 512, 512], 'vgg_16/conv5/conv5_3/weights/Momentum': [3, 3, 512, 512], 'vgg_16/fc6/biases': [4096], 'vgg_16/fc6/biases/Momentum': [4096], 'vgg_16/fc6/weights': [25088, 4096], 'vgg_16/fc6/weights/Momentum': [25088, 4096], 'vgg_16/fc7/biases': [4096], 'vgg_16/fc7/biases/Momentum': [4096], 'vgg_16/fc7/weights': [4096, 4096], 'vgg_16/fc7/weights/Momentum': [4096, 4096], 'vgg_16/rpn_bbox_pred/biases': [36], 'vgg_16/rpn_bbox_pred/biases/Momentum': [36], 'vgg_16/rpn_bbox_pred/weights': [1, 1, 512, 36], 'vgg_16/rpn_bbox_pred/weights/Momentum': [1, 1, 512, 36], 'vgg_16/rpn_cls_score/biases': [18], 'vgg_16/rpn_cls_score/biases/Momentum': [18], 'vgg_16/rpn_cls_score/weights': [1, 1, 512, 18], 'vgg_16/rpn_cls_score/weights/Momentum': [1, 1, 512, 18], 'vgg_16/rpn_conv/3x3/biases': [512], 'vgg_16/rpn_conv/3x3/biases/Momentum': [512], 'vgg_16/rpn_conv/3x3/weights': [3, 3, 512, 512], 'vgg_16/rpn_conv/3x3/weights/Momentum': [3, 3, 512, 512]}

morpheusthewhite commented 5 years ago

It does converge to a value (about 1).

(Those isolated peeks are normal during training)

AICVHub commented 5 years ago

Yes, isolated peeks may normal.

However, the trained result is not work in test. I run the demo.py, but model can't detected anything.

------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月23日(星期一) 晚上6:20 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] train loss misconvergence (#106)

It does converge to a value (about 1).

(Those isolated peeks are normal during training)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

morpheusthewhite commented 5 years ago

What is the output on terminal of demo.py?

AICVHub commented 5 years ago

demo.py is the test file in the raw repo. It should plot the detect results, but it got nothing after runing.

------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月23日(星期一) 晚上6:31 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] train loss misconvergence (#106)

What is the output on terminal of demo.py?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

morpheusthewhite commented 5 years ago

Yes, but what do you see on the terminal when you run python demo.py?

AICVHub commented 5 years ago

Results of the demo.py are following as (test images can be feed to the model, but no detected boxes):

------------------ 原始邮件 ------------------ 发件人: "morpheusthewhite"<notifications@github.com>; 发送时间: 2019年9月23日(星期一) 晚上7:06 收件人: "dBeker/Faster-RCNN-TensorFlow-Python3"<Faster-RCNN-TensorFlow-Python3@noreply.github.com>; 抄送: "李汶松"<995431104@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [dBeker/Faster-RCNN-TensorFlow-Python3] Demo.py does not detect anything (#106)

Yes, but what do you see on the terminal when you run python demo.py?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

dBeker / Faster-RCNN-TensorFlow-Python3

Demo.py does not detect anything #106