关于使用VOC数据来训练ssdlite_mobilenetV3_small mAP太低的问题

dengxinlong commented 3 years ago

我利用VOC2012来训练ssdlite_mobilenetV3_small, 根据在训练时显示的信息，loss: 6-7之间；然后利用voc2007进行验证的时候，mAP只有9.39%，如图：

配置文件如下：

architecture: SSD
use_gpu: true
max_iters: 5000
snapshot_iter: 20000
log_iter: 20
metric: VOC
pretrain_weights: pretrainWeights/MobileNetV3_small_x1_0_ssld_pretrained
save_dir: output
weights: output/ssdlite_mobilenet_v3_small/best_model
# 80(label_class) + 1(background)
num_classes: 21

SSD:
  backbone: MobileNetV3
  multi_box_head: SSDLiteMultiBoxHead
  output_decoder:
    background_label: 0
    keep_top_k: 200
    nms_eta: 1.0
    nms_threshold: 0.45
    nms_top_k: 400
    score_threshold: 0.01

MobileNetV3:
  scale: 1.0
  model_name: small
  extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]]
  feature_maps: [5, 7, 8, 9, 10, 11]
  lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75]
  conv_decay: 0.00004
  multiplier: 0.5

SSDLiteMultiBoxHead:
  aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]]
  base_size: 320
  steps: [16, 32, 64, 107, 160, 320]
  flip: true
  clip: true
  max_ratio: 95
  min_ratio: 20
  offset: 0.5
  conv_decay: 0.00004

LearningRate:
  base_lr: 0.4
  schedulers:
  - !CosineDecay
    max_iters: 400000
  - !LinearWarmup
    start_factor: 0.33333
    steps: 2000

OptimizerBuilder:
  optimizer:
    momentum: 0.9
    type: Momentum
  regularizer:
    factor: 0.0005
    type: L2

TrainReader:
  inputs_def:
    image_shape: [3, 320, 320]
    fields: ['image', 'gt_bbox', 'gt_class']
  dataset:
    !VOCDataSet
    dataset_dir: dataset/voc
    anno_path: trainval.txt
  sample_transforms:
  - !DecodeImage
    to_rgb: true
  - !RandomDistort
    brightness_lower: 0.875
    brightness_upper: 1.125
    is_order: true
  - !RandomExpand
    fill_value: [123.675, 116.28, 103.53]
  - !RandomCrop
    allow_no_crop: false
  - !NormalizeBox {}
  - !ResizeImage
    interp: 1
    target_size: 320
    use_cv2: false
  - !RandomFlipImage
    is_normalized: false
  - !NormalizeImage
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
    is_scale: true
    is_channel_first: false
  - !Permute
    to_bgr: false
    channel_first: true
  batch_size: 32
  shuffle: true
  drop_last: true
  # Number of working threads/processes. To speed up, can be set to 16 or 32 etc.
  worker_num: 8
  # Size of shared memory used in result queue. After increasing `worker_num`, need expand `memsize`.
  memsize: 8G
  # Buffer size for multi threads/processes.one instance in buffer is one batch data.
  # To speed up, can be set to 64 or 128 etc.
  bufsize: 32
  use_process: true

EvalReader:
  inputs_def:
    image_shape: [3, 320, 320]
    fields: ['image', 'im_shape', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
  dataset:
    !VOCDataSet
    dataset_dir: dataset/voc
    anno_path: test.txt
  sample_transforms:
  - !DecodeImage
    to_rgb: true
  - !NormalizeBox {}
  - !ResizeImage
    interp: 1
    target_size: 320
    use_cv2: false
  - !NormalizeImage
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
    is_scale: true
    is_channel_first: false
  - !Permute
    to_bgr: false
    channel_first: True
  batch_size: 8
  worker_num: 8
  bufsize: 32
  use_process: false

TestReader:
  inputs_def:
    image_shape: [3,320,320]
    fields: ['image', 'im_id', 'im_shape']
  dataset:
    !ImageFolder
    anno_path: annotations/instances_val2017.json
  sample_transforms:
  - !DecodeImage
    to_rgb: true
  - !ResizeImage
    interp: 1
    max_size: 0
    target_size: 320
    use_cv2: true
  - !NormalizeImage
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
    is_scale: true
    is_channel_first: false
  - !Permute
    to_bgr: false
    channel_first: True
  batch_size: 1

liuhuiCNN commented 3 years ago

感谢反馈，请问：（1）您使用的paddle版本和PaddleDetection版本是多少呢？（2）配置文件有改参数吗？（3）配置文件默认学习率是8卡的，单卡需要改成默认的1/8。

dengxinlong commented 3 years ago

paddle和PaddleDetection都是最新的。我采用的是单卡(TiTan V)训练。使用了原配置文件中的预训练参数（离线下载的，然后修改预训练权重的路径）。下面是我的配置文件，修改了参数。内容包括：

将数据集修改为VOC，评估也改为了VOC。
将batch_size修改为了64（原来的配置是64）
max_iters: 修改为了5000
因为是VOC, 所以num_classes 修改为了21

architecture: SSD
use_gpu: true
max_iters: 5000
snapshot_iter: 20000
log_iter: 20
metric: VOC
pretrain_weights: pretrainWeights/MobileNetV3_small_x1_0_ssld_pretrained
save_dir: output
weights: output/ssdlite_mobilenet_v3_small/best_model
# 80(label_class) + 1(background)
num_classes: 21

SSD:
  backbone: MobileNetV3
  multi_box_head: SSDLiteMultiBoxHead
  output_decoder:
    background_label: 0
    keep_top_k: 200
    nms_eta: 1.0
    nms_threshold: 0.45
    nms_top_k: 400
    score_threshold: 0.01

MobileNetV3:
  scale: 1.0
  model_name: small
  extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]]
  feature_maps: [5, 7, 8, 9, 10, 11]
  lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75]
  conv_decay: 0.00004
  multiplier: 0.5

SSDLiteMultiBoxHead:
  aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]]
  base_size: 320
  steps: [16, 32, 64, 107, 160, 320]
  flip: true
  clip: true
  max_ratio: 95
  min_ratio: 20
  offset: 0.5
  conv_decay: 0.00004

LearningRate:
  base_lr: 0.4
  schedulers:
  - !CosineDecay
    max_iters: 400000
  - !LinearWarmup
    start_factor: 0.33333
    steps: 2000

OptimizerBuilder:
  optimizer:
    momentum: 0.9
    type: Momentum
  regularizer:
    factor: 0.0005
    type: L2

TrainReader:
  inputs_def:
    image_shape: [3, 320, 320]
    fields: ['image', 'gt_bbox', 'gt_class']
  dataset:
    !VOCDataSet
    dataset_dir: dataset/voc
    anno_path: trainval.txt
  sample_transforms:
  - !DecodeImage
    to_rgb: true
  - !RandomDistort
    brightness_lower: 0.875
    brightness_upper: 1.125
    is_order: true
  - !RandomExpand
    fill_value: [123.675, 116.28, 103.53]
  - !RandomCrop
    allow_no_crop: false
  - !NormalizeBox {}
  - !ResizeImage
    interp: 1
    target_size: 320
    use_cv2: false
  - !RandomFlipImage
    is_normalized: false
  - !NormalizeImage
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
    is_scale: true
    is_channel_first: false
  - !Permute
    to_bgr: false
    channel_first: true
  batch_size: 32
  shuffle: true
  drop_last: true
  # Number of working threads/processes. To speed up, can be set to 16 or 32 etc.
  worker_num: 8
  # Size of shared memory used in result queue. After increasing `worker_num`, need expand `memsize`.
  memsize: 8G
  # Buffer size for multi threads/processes.one instance in buffer is one batch data.
  # To speed up, can be set to 64 or 128 etc.
  bufsize: 32
  use_process: true

EvalReader:
  inputs_def:
    image_shape: [3, 320, 320]
    fields: ['image', 'im_shape', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
  dataset:
    !VOCDataSet
    dataset_dir: dataset/voc
    anno_path: test.txt
  sample_transforms:
  - !DecodeImage
    to_rgb: true
  - !NormalizeBox {}
  - !ResizeImage
    interp: 1
    target_size: 320
    use_cv2: false
  - !NormalizeImage
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
    is_scale: true
    is_channel_first: false
  - !Permute
    to_bgr: false
    channel_first: True
  batch_size: 8
  worker_num: 8
  bufsize: 32
  use_process: false

TestReader:
  inputs_def:
    image_shape: [3,320,320]
    fields: ['image', 'im_id', 'im_shape']
  dataset:
    !ImageFolder
    anno_path: annotations/instances_val2017.json
  sample_transforms:
  - !DecodeImage
    to_rgb: true
  - !ResizeImage
    interp: 1
    max_size: 0
    target_size: 320
    use_cv2: true
  - !NormalizeImage
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
    is_scale: true
    is_channel_first: false
  - !Permute
    to_bgr: false
    channel_first: True
  batch_size: 1

dengxinlong commented 3 years ago

下面是最后训练的时候的截图：这个是不是有问题啊

dengxinlong commented 3 years ago

具体怎么修改您能说一下吧，就拿我这个配置文件来说。

yghstill commented 3 years ago

@dengxinlong ssdlite系列模型都是采用余弦学习率调整策略，起始学习率非常高base_lr: 0.4，很容易出nan，无法正常收敛，在多卡大batch情况下才能适用，如果单卡训练，建议使用PiecewiseDecay，学习率设置成0.00025

LearningRate:
  base_lr: 0.00025
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones: [80000, 100000]

dengxinlong commented 3 years ago

这样啊，看来还是我没有认真阅读文档和响应的配置文件。也就是说调整成这样：

dengxinlong commented 3 years ago

我想问一下，关于这部分的文档在哪里啊，我没有找到。不知道是否方便告知，非常感谢！！！

liuhuiCNN commented 3 years ago

我想问一下，关于这部分的文档在哪里啊，我没有找到。不知道是否方便告知，非常感谢！！！这里有学习率说明：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0-rc/docs/FAQ.md

yghstill commented 3 years ago

这样啊，看来还是我没有认真阅读文档和响应的配置文件。也就是说调整成这样：

是的，ssd其他模型比如：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0-rc/configs/ssd/ssd_vgg16_300_voc.yml#L45 ，是在4卡GPU上训的，参考这个配置文件，你的batch size=32，ssd_vgg16_300_voc中的batch size=8，将lr更正为0.001 / 4 / (8/32) = 0.001。除了参考上面提到的FAQ文档，也可参考全流程文档了解各个环节：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0-rc/docs/tutorials/DetectionPipeline.md

dengxinlong commented 3 years ago

非常感谢您的答复，不过好像您提供的最后一份文档里面我也看过，但里面没有关于学习率调整的问题。可能是为学艺不精，没有随机应变。

liuhuiCNN commented 3 years ago

非常感谢您的答复，不过好像您提供的最后一份文档里面我也看过，但里面没有关于学习率调整的问题。可能是为学艺不精，没有随机应变。

后续我们优化下文档，感谢反馈。

yghstill commented 3 years ago

@dengxinlong CosineDecay这个学习率调整策略适合在多卡大batch size情况下，您单卡训练没考虑这个没关系，之后训练按照GPU数、batch size大小相应的调整好lr，就不会有太大问题啦

dengxinlong commented 3 years ago

非常感谢您的答复，不过好像您提供的最后一份文档里面我也看过，但里面没有关于学习率调整的问题。可能是为学艺不精，没有随机应变。

后续我们优化下文档，感谢反馈。

没有没有，可能是我学艺不精。不过你们的文档关于全流程的可以稍微说一下，应该就是几句话的事情。

dengxinlong commented 3 years ago

@dengxinlong CosineDecay这个学习率调整策略适合在多卡大batch size情况下，您单卡训练没考虑这个没关系，之后训练按照GPU数、batch size大小相应的调整好lr，就不会有太大问题啦

嗯嗯，好的，非常感谢您的答复！

PaddlePaddle / PaddleDetection

关于使用VOC数据来训练ssdlite_mobilenetV3_small mAP太低的问题 #2333