COCO2017、70個epochs下mAP比預期低、Model Size比釋出的來的高

e96031413 commented 3 years ago

您好：

我有嘗試自己用COCO 2017的dataset進行訓練，320*320的解析度，訓練了70個epochs後觀察workspace資料夾中的nanodet_m/model_best/eval_results.txt，發現似乎不如您在README表格中提及的數值mAP=20.6

除此之外，訓練好的.pth檔案是7.55MB，跟您在這裡提供的.pth(僅3.86MB)有不小的落差，想請問我設定上，是不是哪裡出了問題，謝謝

Epoch:10
mAP: 0.10865674775150695
AP_50: 0.20425588097310773
AP_75: 0.10198967672602136
AP_small: 0.03051569507303643
AP_m: 0.1003268896843503
AP_l: 0.19166207801265334
Epoch:20
mAP: 0.11907455947537404
AP_50: 0.21912647960675255
AP_75: 0.1133359858799472
AP_small: 0.03340540840588444
AP_m: 0.10963350183214637
AP_l: 0.20611515962059668
Epoch:30
mAP: 0.12041569311457662
AP_50: 0.21872472727858525
AP_75: 0.11812772355005044
AP_small: 0.04121160130779743
AP_m: 0.11739734701210236
AP_l: 0.2094107062858298
Epoch:40
mAP: 0.1270034610457148
AP_50: 0.23137233492196027
AP_75: 0.12315771574507514
AP_small: 0.03728759301893687
AP_m: 0.11945066406385116
AP_l: 0.21847579864161104
Epoch:50
mAP: 0.1795337038572003
AP_50: 0.30874297066148143
AP_75: 0.17828656967793036
AP_small: 0.04914104329258958
AP_m: 0.16759143776161284
AP_l: 0.30522657762471966
Epoch:60
mAP: 0.1891646527258355
AP_50: 0.3217886198745358
AP_75: 0.1891903840608356
AP_small: 0.051945493440277546
AP_m: 0.17284699119758662
AP_l: 0.32539013473740197
Epoch:70
mAP: 0.18977107054907716
AP_50: 0.32244417711086515
AP_75: 0.19009949009338137
AP_small: 0.05201577054132174
AP_m: 0.1732704792902566
AP_l: 0.3251865578308541
Epoch:10
mAP: 0.12257141649340739
AP_50: 0.23619860654189756
AP_75: 0.1111660380944831
AP_small: 0.046249899529588356
AP_m: 0.13267609318317786
AP_l: 0.1919272466354749
Epoch:20
mAP: 0.14031381436801635
AP_50: 0.264269010783403
AP_75: 0.13001336222809526
AP_small: 0.05716782537295004
AP_m: 0.15169727664897573
AP_l: 0.22203464258839573
Epoch:30
mAP: 0.14217658101223368
AP_50: 0.26543726016697605
AP_75: 0.13404657580752513
AP_small: 0.05421967007099442
AP_m: 0.14589956171388785
AP_l: 0.22164060066991223
Epoch:40
mAP: 0.1515624729886679
AP_50: 0.2795920635444821
AP_75: 0.14477444138409137
AP_small: 0.05784293314338036
AP_m: 0.16267972648392726
AP_l: 0.234052709160686
Epoch:50
mAP: 0.18045894932158196
AP_50: 0.3248739247557717
AP_75: 0.17422379505767935
AP_small: 0.0711162443123105
AP_m: 0.18759840898213975
AP_l: 0.2749215457539428
Epoch:60
mAP: 0.18356200279853221
AP_50: 0.3270843667595295
AP_75: 0.1799592321268877
AP_small: 0.0711201510925709
AP_m: 0.19149197422993125
AP_l: 0.2810718905773653
Epoch:70
mAP: 0.18417714373096364
AP_50: 0.32818039011499023
AP_75: 0.18012643184294952
AP_small: 0.07166987560610152
AP_m: 0.1930917055587776
AP_l: 0.2815969931730256

nanodet-m.yml

#Config File example
save_dir: workspace/nanodet_m
model:
  arch:
    name: GFL
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2,3,4]
      activation: LeakyReLU
    fpn:
      name: PAN
      in_channels: [116, 232, 464]
      out_channels: 96
      start_level: 0
      num_outs: 3
    head:
      name: NanoDetHead
      num_classes: 80
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      share_cls_reg: True
      octave_base_scale: 5
      scales_per_octave: 1
      strides: [8, 16, 32]
      reg_max: 7
      norm_cfg:
        type: BN
      loss:
        loss_qfl:
          name: QualityFocalLoss
          use_sigmoid: True
          beta: 2.0
          loss_weight: 1.0
        loss_dfl:
          name: DistributionFocalLoss
          loss_weight: 0.25
        loss_bbox:
          name: GIoULoss
          loss_weight: 2.0
data:
  train:
    name: coco
    img_path: ../coco/images/train2017
    ann_path: ../coco/annotations/instances_train2017.json
    input_size: [320,320] #[w,h]
    keep_ratio: True
    pipeline:
      perspective: 0.0
      scale: [0.6, 1.4]
      stretch: [[1, 1], [1, 1]]
      rotation: 0
      shear: 0
      translate: 0
      flip: 0.5
      brightness: 0.2
      contrast: [0.8, 1.2]
      saturation: [0.8, 1.2]
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
  val:
    name: coco
    img_path: ../coco/images/val2017
    ann_path: ../coco/annotations/instances_val2017.json
    input_size: [416,416] #[w,h]
    keep_ratio: True
    pipeline:
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
device:
  gpu_ids: [0,1,2,3]
  workers_per_gpu: 12
  batchsize_per_gpu: 160
schedule:
#  resume:
#  load_model: YOUR_MODEL_PATH
  optimizer:
    name: SGD
    lr: 0.14
    momentum: 0.9
    weight_decay: 0.0001
  warmup:
    name: linear
    steps: 300
    ratio: 0.1
  total_epochs: 70
  lr_schedule:
    name: MultiStepLR
    milestones: [40,55,60,65]
    gamma: 0.1
  val_intervals: 10
evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP

log:
  interval: 10

class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']

RangiLyu commented 3 years ago

sgd存在一定的随机性，20.6的mAP是多次调参后得到的，如果想要复现效果，可以加长第一次lr decay前的epoch数，取mAP最佳的模型进行lr decay。

训练过程中保存的模型是附带优化器的check point，可以在save model时不保存优化器来获得3MB的模型。

e96031413 commented 3 years ago

謝謝您的回覆，我來嘗試看看

eeric commented 3 years ago

ok

wpeak58 commented 3 years ago

@e96031413 ，除了@RangeiLiu说的外，加载预训练模型，从头来训练，最终保存的模型是否把预训练的参数也保存进去，这样3+7=10，也就是模型最终的大小为10M，PC端1G多的内存占用，20%的CPU占用。

e96031413 commented 3 years ago

@wpeak58 感謝您的分享

RangiLyu / nanodet

COCO2017、70個epochs下mAP比預期低、Model Size比釋出的來的高 #50

nanodet-m.yml