eric612 / MobileNet-YOLO

A caffe implementation of MobileNet-YOLO detection network
Other
865 stars 442 forks source link

Where is a YOLOv3-Tiny caffemodel #25

Closed wzjiang closed 5 years ago

wzjiang commented 5 years ago

Thank you for your work. I want to train a yolov3 tiny version for License plate detection in TX1. I have some problems that Where is the vegetable market? And the speed of pelee is faster than yolov3 tiny?I need the fastest model. Could you give me some advice?

eric612 commented 5 years ago
  1. Peleenet is fast and high performance , but you need make sure the model was be optimized on edge computing , choose a fastest inference framework is important .

  2. My pelee+yolov3 is not a fastest model , because the macc is 4x as mobilenet-ssd. it is just a my testing model. I suggest you can try peleenet+ssd or yolov3-tiny first.

wzjiang commented 5 years ago

Thank you for your reply. Could you provide the yolov3 tiny training method(including pretrain caffemodel,test.prototxt, train.prototxt, solver.prototxt, deploy.prototxt)? Thank you

wzjiang commented 5 years ago

I modified the prototxt referenced from your models/darknet_yolov3/tiny-yolov3.prototxt. The train prototxt is that(training in the voc2007) : the loss is decreased but the the accuracy is decreased too. gwl oz gj2 z kx 9y3 e

name: "yolov3-tiny" layer { name: "data" type: "AnnotatedData" top: "data" top: "label" include { phase: TRAIN }

transform_param { scale: 0.007843 mirror: true mean_value: 127.5 mean_value: 127.5 mean_value: 127.5 resize_param { prob: 0.1 resize_mode: WARP height: 416 width: 416 interp_mode: LINEAR interp_mode: AREA interp_mode: LANCZOS4 } resize_param { prob: 0.1 resize_mode: WARP height: 608 width: 608 interp_mode: LINEAR interp_mode: AREA interp_mode: LANCZOS4 }

resize_param {
  prob: 0.1
  resize_mode: WARP
  height: 320
  width: 320
  interp_mode: LINEAR
  interp_mode: AREA
  interp_mode: LANCZOS4
}
resize_param {
  prob: 0.1
  resize_mode: WARP
  height: 352
  width: 352
  interp_mode: LINEAR
  interp_mode: AREA
  interp_mode: LANCZOS4
}
resize_param {
  prob: 0.1
  resize_mode: WARP
  height: 384
  width: 384
  interp_mode: LINEAR
  interp_mode: AREA
  interp_mode: LANCZOS4
}
resize_param {
  prob: 0.1
  resize_mode: WARP
  height: 448
  width: 448
  interp_mode: LINEAR
  interp_mode: AREA
  interp_mode: LANCZOS4
}
resize_param {
  prob: 0.1
  resize_mode: WARP
  height: 480
  width: 480
  interp_mode: LINEAR
  interp_mode: AREA
  interp_mode: LANCZOS4
}
resize_param {
  prob: 0.1
  resize_mode: WARP
  height: 512
  width: 512
  interp_mode: LINEAR
  interp_mode: AREA
  interp_mode: LANCZOS4
}
resize_param {
  prob: 0.1
  resize_mode: WARP
  height: 544
  width: 544
  interp_mode: LINEAR
  interp_mode: AREA
  interp_mode: LANCZOS4
}
resize_param {
  prob: 0.1
  resize_mode: WARP
  height: 576
  width: 576
  interp_mode: LINEAR
  interp_mode: AREA
  interp_mode: LANCZOS4
}

emit_constraint {
  emit_type: CENTER
}
distort_param {
  brightness_prob: 0.5
  brightness_delta: 32.0
  contrast_prob: 0.5
  contrast_lower: 0.5
  contrast_upper: 1.5
  hue_prob: 0.5
  hue_delta: 18.0
  saturation_prob: 0.5
  saturation_lower: 0.5
  saturation_upper: 1.5
  random_order_prob: 0.0
}
expand_param {
  prob: 0.5
  max_expand_ratio: 2.0
}

} data_param { source: "examples/VOC0712/VOC0712_trainval_lmdb" batch_size: 24 backend: LMDB } annotated_data_param { yolo_data_type : 1 yolo_data_jitter : 0.3 label_map_file: "data/VOC0712/labelmap_voc.prototxt" } } layer { bottom: "data" top: "layer1-conv" name: "layer1-conv" type: "Convolution" convolution_param { num_output: 16 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer1-conv" top: "layer2-maxpool" name: "layer2-maxpool" type: "Pooling" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { bottom: "layer2-maxpool" top: "layer3-conv" name: "layer3-conv" type: "Convolution" convolution_param { num_output: 32 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer3-conv" top: "layer4-maxpool" name: "layer4-maxpool" type: "Pooling" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { bottom: "layer4-maxpool" top: "layer5-conv" name: "layer5-conv" type: "Convolution" convolution_param { num_output: 64 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer5-conv" top: "layer6-maxpool" name: "layer6-maxpool" type: "Pooling" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { bottom: "layer6-maxpool" top: "layer7-conv" name: "layer7-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer7-conv" top: "layer8-maxpool" name: "layer8-maxpool" type: "Pooling" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { bottom: "layer8-maxpool" top: "layer9-conv" name: "layer9-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer9-conv" top: "layer10-maxpool" name: "layer10-maxpool" type: "Pooling" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { bottom: "layer10-maxpool" top: "layer11-conv" name: "layer11-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer11-conv" top: "layer12-maxpool" name: "layer12-maxpool" type: "Pooling" pooling_param { kernel_size: 1 stride: 1 pool: MAX } } layer { bottom: "layer12-maxpool" top: "layer13-conv" name: "layer13-conv" type: "Convolution" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer13-conv" top: "layer14-conv" name: "layer14-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer14-conv" top: "layer15-conv" name: "layer15-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer15-conv" top: "layer16-conv" name: "layer16-conv" type: "Convolution" convolution_param { num_output: 75 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { bottom: "layer16-conv" type: "Concat" top: "layer17-yolo" name: "layer17-yolo" } layer { bottom: "layer14-conv" top: "layer18-route" name: "layer18-route" type: "Concat" } layer { bottom: "layer18-route" top: "layer19-conv" name: "layer19-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer19-conv" top: "layer20-upsample" name: "layer20-upsample" type: "Deconvolution" convolution_param { stride: 2 kernel_size: 4 num_output: 128 group: 128 pad: 1 bias_term: false weight_filler { type: "bilinear" } } } layer { bottom: "layer20-upsample" bottom: "layer9-conv" top: "layer21-route" name: "layer21-route" type: "Concat" } layer { bottom: "layer21-route" top: "layer22-conv" name: "layer22-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer22-conv" top: "layer23-conv" name: "layer23-conv" type: "Convolution" convolution_param { num_output: 75 kernel_size: 1 pad: 0 stride: 1 bias_term: true } }

layer { name: "Yolov3Loss1" type: "Yolov3" bottom: "layer17-yolo" bottom: "label" top: "det_loss1" loss_weight: 1 yolov3_param { side: 13 num_class: 20 num: 3 object_scale: 5.0 noobject_scale: 1.0 class_scale: 1.0 coord_scale: 1.0 thresh: 0.6 anchors_scale : 32 use_logic_gradient : false

10,14, 23,27, 37,58, 81,82, 135,169, 344,319

biases: 10
biases: 14
biases: 23
biases: 27
biases: 37
biases: 58
biases: 81
biases: 82
biases: 135
biases: 169
biases: 344
biases: 319

mask:3 mask:4 mask:5 } } layer { name: "Yolov3Loss2" type: "Yolov3" bottom: "layer23-conv" bottom: "label" top: "det_loss2" loss_weight: 1 yolov3_param { side: 26 num_class: 20 num: 3 object_scale: 5.0 noobject_scale: 1.0 class_scale: 1.0 coord_scale: 1.0 thresh: 0.6 anchors_scale : 16 use_logic_gradient : false

10,14, 23,27, 37,58, 81,82, 135,169, 344,319

biases: 10
biases: 14
biases: 23
biases: 27
biases: 37
biases: 58
biases: 81
biases: 82
biases: 135
biases: 169
biases: 344
biases: 319

mask:0 mask:1 mask:2 } }

the solver prototxt is like that -----------------------------------------------------------------------------------------------------------------] train_net: "models/yolov3_tiny/train.prototxt" test_net: "models/yolov3_tiny/test.prototxt"

test_iter: 4952

test_interval: 1000

test_iter: 100 test_interval: 1000 base_lr: 0.0005 display: 10 max_iter: 60000 lr_policy: "multistep" gamma: 0.5 weight_decay: 0.00005 snapshot: 1000 snapshot_prefix: "models/yolov3_tiny/yolov3_tiny_voc" solver_mode: GPU debug_info: false snapshot_after_train: true test_initialization: false average_loss: 10 stepvalue: 10000 stepvalue: 30000 iter_size: 2 type: "RMSProp" eval_type: "detection" ap_version: "11point" show_per_class_result: true

eric612 commented 5 years ago

I suggest modify from this prototxt

You can reference the refine solver , and change the iters.

wzjiang commented 5 years ago

thank you very much. could you provide a pretrain caffemodel. I train yolov3_lite with mobilenet_iter_73000.caffemodel and there is a normal result such as

I1015 14:07:20.697209 53720 solver.cpp:252] Iteration 650 (0.660692 iter/s, 15.1356s/10 iters), loss = 5.90476 I1015 14:07:20.697265 53720 solver.cpp:271] Train net output #0: det_loss1 = 3.6958 ( 1 = 3.6958 loss) I1015 14:07:20.697275 53720 solver.cpp:271] Train net output #1: det_loss2 = 4.76265 ( 1 = 4.76265 loss) I1015 14:07:20.697283 53720 sgd_solver.cpp:121] Iteration 650, lr = 0.0005 I1015 14:07:22.690366 53720 yolov3_layer.cpp:359] avg_noobj: 0.00366752 avg_obj: 0.271547 avg_iou: 0.595767 avg_cat: 0.730663 recall: 0.66366 recall75: 0.233297 count: 10 I1015 14:07:22.733872 53720 yolov3_layer.cpp:359] avg_noobj: 0.000504621 avg_obj: 0.175436 avg_iou: 0.530811 avg_cat: 0.594559 recall: 0.699471 recall75: 0.0297619 count: 9 I1015 14:07:26.461459 53720 yolov3_layer.cpp:359] avg_noobj: 0.0039172 avg_obj: 0.342939 avg_iou: 0.605269 avg_cat: 0.681124 recall: 0.776094 recall75: 0.253143 count: 11 I1015 14:07:26.495635 53720 yolov3_layer.cpp:359] avg_noobj: 0.00073863 avg_obj: 0.23612 avg_iou: 0.531507 avg_cat: 0.530818 recall: 0.616148 recall75: 0.0865385 count: 6 I1015 14:07:29.930438 53720 yolov3_layer.cpp:359] avg_noobj: 0.00322744 avg_obj: 0.278562 avg_iou: 0.582834 avg_cat: 0.660926 recall: 0.692216 recall75: 0.146144 count: 9 I1015 14:07:29.959137 53720 yolov3_layer.cpp:359] avg_noobj: 0.000569099 avg_obj: 0.132548 avg_iou: 0.461339 avg_cat: 0.571731 recall: 0.430983 recall75: 0.0178571 count: 7 I1015 14:07:33.502148 53720 yolov3_layer.cpp:359] avg_noobj: 0.00403286 avg_obj: 0.376408 avg_iou: 0.58732 avg_cat: 0.781398 recall: 0.67808 recall75: 0.201939 count: 13 I1015 14:07:33.580006 53720 yolov3_layer.cpp:359] avg_noobj: 0.000308316 avg_obj: 0.183979 avg_iou: 0.518314 avg_cat: 0.579064 recall: 0.593266 recall75: 0 count: 3

But without pretrain caffemodel. The result is unnormal as tiny(the obj and noobj is decreased)

I1015 14:09:19.419375 39601 solver.cpp:271] Train net output #1: det_loss2 = 4.14267 (* 1 = 4.14267 loss) I1015 14:09:19.419384 39601 sgd_solver.cpp:121] Iteration 1030, lr = 0.0005 I1015 14:09:21.684012 39601 yolov3_layer.cpp:359] avg_noobj: 0.00270931 avg_obj: 0.010739 avg_iou: 0.551339 avg_cat: 0.140888 recall: 0.631377 recall75: 0.129889 count: 13 I1015 14:09:21.779618 39601 yolov3_layer.cpp:359] avg_noobj: 0.000492056 avg_obj: 0.00298653 avg_iou: 0.335181 avg_cat: 0.167357 recall: 0.126984 recall75: 0.0238095 count: 3 I1015 14:09:26.213709 39601 yolov3_layer.cpp:359] avg_noobj: 0.00225907 avg_obj: 0.00736756 avg_iou: 0.543772 avg_cat: 0.118726 recall: 0.616868 recall75: 0.0789655 count: 12 I1015 14:09:26.329932 39601 yolov3_layer.cpp:359] avg_noobj: 0.000258286 avg_obj: 0.00152417 avg_iou: 0.297304 avg_cat: 0.229101 recall: 0.174074 recall75: 0.0185185 count: 8 I1015 14:09:29.786145 39601 yolov3_layer.cpp:359] avg_noobj: 0.00257914 avg_obj: 0.0089983 avg_iou: 0.570668 avg_cat: 0.141732 recall: 0.642173 recall75: 0.122874 count: 12 I1015 14:09:29.821192 39601 yolov3_layer.cpp:359] avg_noobj: 0.00027516 avg_obj: 0.00178334 avg_iou: 0.375605 avg_cat: 0.172484 recall: 0.307143 recall75: 0 count: 4 I1015 14:09:32.934083 39601 yolov3_layer.cpp:359] avg_noobj: 0.00277038 avg_obj: 0.0109594 avg_iou: 0.574585 avg_cat: 0.13856 recall: 0.706946 recall75: 0.124347 count: 10 I1015 14:09:33.014472 39601 yolov3_layer.cpp:359] avg_noobj: 0.000245922 avg_obj: 0.000903233 avg_iou: 0.440693 avg_cat: 0.120063 recall: 0.353472 recall75: 0 count: 6 I1015 14:09:35.255605 39601 solver.cpp:252] Iteration 1040 (0.631478 iter/s, 15.8359s/10 iters), loss = 12.1681